Message ID | 20141001185622.GI17706@oracle.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, 2014-10-01 at 14:56 -0400, Sowmini Varadhan wrote: > A vnet_port_remove could be triggered as a result of an ldm-unbind operation > by the peer, or a module unload, or other changes to the inter-vnet-link > configuration. When this is concurrent with vnet_start_xmit(), > the following sequence could occur > > thread 1 thread 2 > vnet_start_xmit > -> tx_port_find > spin_lock_irqsave(&vp->lock..) > ret = __tx_port_find(..) > spin_lock_irqrestore(&vp->lock..) > vio_remove -> .. > ->vnet_port_remove > spin_lock_irqsave(&vp->lock..) > cleanup > spin_lock_irqrestore(&vp->lock..) > kfree(port) > /* attempt to use ret will bomb */ > > This patch avoids the problem by holding/releasing a refcnt on > the vnet_port. Hmpff... This calls for rcu protection here ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/01/2014 02:56 PM, Sowmini Varadhan wrote: > - list_for_each_entry(port, &vp->port_list, list) { > - if (!port->switch_port) > - continue; > - if (!port_is_up(port)) > - continue; > + port = vp->switch_port; > + if (port != NULL && port_is_up(port)) { > + vnet_hold(port); > return port; > } > return NULL; This "vp->switch_port" addition doesn't appear to be related to the port refcnt change, and doesn't allow for multiple switch ports. +-DLS > @@ -1581,10 +1621,12 @@ static int vnet_port_probe(struct vio_dev *vdev, const struct vio_device_id *id) > port->switch_port = switch_port; > > spin_lock_irqsave(&vp->lock, flags); > - if (switch_port) > + if (switch_port) { > + vp->switch_port = port; > list_add(&port->list, &vp->port_list); ...and here. > - else > + } else { > list_add_tail(&port->list, &vp->port_list); > + } > hlist_add_head(&port->hash, &vp->port_hash[vnet_hashfn(port->raddr)]); > spin_unlock_irqrestore(&vp->lock, flags); > > @@ -1631,11 +1673,14 @@ static int vnet_port_remove(struct vio_dev *vdev) > */ > spin_lock_irqsave(&vp->lock, flags); > port->flags |= VNET_PORT_DEAD; > + if (port->switch_port) > + vp->switch_port = NULL; > list_del(&port->list); > hlist_del(&port->hash); ..and here. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On (10/01/14 15:06), David L Stevens wrote: > > This "vp->switch_port" addition doesn't appear to be related to the port refcnt > change, and doesn't allow for multiple switch ports. The switch_port is the connection to Dom0. Do you envision us having more than one switch_port? How? --Sowmini -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/01/2014 03:23 PM, Sowmini Varadhan wrote: > On (10/01/14 15:06), David L Stevens wrote: >> >> This "vp->switch_port" addition doesn't appear to be related to the port refcnt >> change, and doesn't allow for multiple switch ports. > > The switch_port is the connection to Dom0. Do you envision us having more than > one switch_port? How? While Dom0 might only create one port with the "switch" flag, the flag just means "I can reach anybody" and is not inherently unique. I don't think an attached VM should assume there is always only one; it prevents multipath load balancing kinds of things in the future. Also, there is the broader point that this sort of change should be a separate patch. It isn't required for fixing the dangling reference -- it is an independent change. +-DLS -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On (10/01/14 12:05), Eric Dumazet wrote: > > Hmpff... This calls for rcu protection here ! > I did consider that, but given that the lists containing the ports are accessed in multiple contexts, some of which can sleep, and given that the vnet port is similar in spirit to the net_device, I followed the net_device model of dev_put etc. --Sowmini -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: David L Stevens <david.stevens@oracle.com> Date: Wed, 01 Oct 2014 15:31:49 -0400 > > > On 10/01/2014 03:23 PM, Sowmini Varadhan wrote: >> On (10/01/14 15:06), David L Stevens wrote: >>> >>> This "vp->switch_port" addition doesn't appear to be related to the port refcnt >>> change, and doesn't allow for multiple switch ports. >> >> The switch_port is the connection to Dom0. Do you envision us having more than >> one switch_port? How? > > While Dom0 might only create one port with the "switch" flag, the flag just means > "I can reach anybody" and is not inherently unique. I don't think an attached > VM should assume there is always only one; it prevents multipath load balancing > kinds of things in the future. > > Also, there is the broader point that this sort of change should be a separate patch. > It isn't required for fixing the dangling reference -- it is an independent change. Multiple switch ports are absolutely allowed by the protocol spec and can provide the suggested facilities David mentioned, don't prevent them from being used. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2014-10-01 at 15:44 -0400, Sowmini Varadhan wrote: > On (10/01/14 12:05), Eric Dumazet wrote: > > > > Hmpff... This calls for rcu protection here ! > > > > I did consider that, but given that the lists containing the ports are > accessed in multiple contexts, some of which can sleep, and given that > the vnet port is similar in spirit to the net_device, I followed the > net_device model of dev_put etc. dev_put() uses per cpu variables, an order of magnitude faster than a atomic put/get :( -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/01/2014 04:15 PM, Eric Dumazet wrote: > On Wed, 2014-10-01 at 15:44 -0400, Sowmini Varadhan wrote: >> On (10/01/14 12:05), Eric Dumazet wrote: >>> >>> Hmpff... This calls for rcu protection here ! >>> >> >> I did consider that, but given that the lists containing the ports are >> accessed in multiple contexts, some of which can sleep, and given that >> the vnet port is similar in spirit to the net_device, I followed the >> net_device model of dev_put etc. > > dev_put() uses per cpu variables, an order of magnitude faster than a > atomic put/get :( I could make this a per-cpu variable, it would not be too hard to change vnet_put/hold etc. That's not a problem. --Sowmini -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Sowmini Varadhan <sowmini.varadhan@oracle.com> Date: Wed, 01 Oct 2014 16:19:41 -0400 > On 10/01/2014 04:15 PM, Eric Dumazet wrote: >> On Wed, 2014-10-01 at 15:44 -0400, Sowmini Varadhan wrote: >>> On (10/01/14 12:05), Eric Dumazet wrote: >>>> >>>> Hmpff... This calls for rcu protection here ! >>>> >>> >>> I did consider that, but given that the lists containing the ports are >>> accessed in multiple contexts, some of which can sleep, and given that >>> the vnet port is similar in spirit to the net_device, I followed the >>> net_device model of dev_put etc. >> >> dev_put() uses per cpu variables, an order of magnitude faster than a >> atomic put/get :( > > I could make this a per-cpu variable, it would not be too hard to > change > vnet_put/hold etc. That's not a problem. If you went with NAPI you could use RCU. All of the negative and complicated aspects of your changes strictly have to do with not going with that implementation route. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Oct 1, 2014, at 12:31 PM, David L Stevens <david.stevens@oracle.com> wrote: > > > On 10/01/2014 03:23 PM, Sowmini Varadhan wrote: >> On (10/01/14 15:06), David L Stevens wrote: >>> >>> This "vp->switch_port" addition doesn't appear to be related to the port refcnt >>> change, and doesn't allow for multiple switch ports. >> >> The switch_port is the connection to Dom0. Do you envision us having more than >> one switch_port? How? > > While Dom0 might only create one port with the "switch" flag, the flag just means > "I can reach anybody" and is not inherently unique. I don't think an attached > VM should assume there is always only one; it prevents multipath load balancing > kinds of things in the future. At the moment our architecture defines only one switch-port. We certainly toyed with the idea of multiple paths(actually LDCs) mainly for the performance but we were able to achieve our goal with one LDC so that was not introduced. Our original design included the idea of multiple LDCs in a port, but not multiple ports for performance purpose. We also explored adding paths to different virtual switches, mainly intended for path failover when one of them failed due to service domain crash/panic. We have such feature in our virtual disk, there we realized it is very proprietary implementation and cost of maintaining and improving those features is high. We abandoned these proprietary solutions but rely on the rich features that already exists in the Guest OS operating systems with multiple network devices. In summary, currently we do not have any plans to multiple paths. Even if we implement multiple paths, we expect it is the choice of the Guest on how it utilizes those paths as it need to have the knowledge of these multiple paths and use them as designed. That is, the driver implementation has to change. For now, I do not see any issues with this driver assuming one switch-port. If we ever support multiple paths, I would expect this driver to change, at that time the code can implement appropriate method of load balancing or failover. -Raghuram > > Also, there is the broader point that this sort of change should be a separate patch. > It isn't required for fixing the dangling reference -- it is an independent change. > > +-DLS > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Oct 1, 2014, at 12:52 PM, David Miller <davem@davemloft.net> wrote: > From: David L Stevens <david.stevens@oracle.com> > Date: Wed, 01 Oct 2014 15:31:49 -0400 > >> >> >> On 10/01/2014 03:23 PM, Sowmini Varadhan wrote: >>> On (10/01/14 15:06), David L Stevens wrote: >>>> >>>> This "vp->switch_port" addition doesn't appear to be related to the port refcnt >>>> change, and doesn't allow for multiple switch ports. >>> >>> The switch_port is the connection to Dom0. Do you envision us having more than >>> one switch_port? How? >> >> While Dom0 might only create one port with the "switch" flag, the flag just means >> "I can reach anybody" and is not inherently unique. I don't think an attached >> VM should assume there is always only one; it prevents multipath load balancing >> kinds of things in the future. >> >> Also, there is the broader point that this sort of change should be a separate patch. >> It isn't required for fixing the dangling reference -- it is an independent change. > > Multiple switch ports are absolutely allowed by the protocol spec and can > provide the suggested facilities David mentioned, don't prevent them from > being used. In reality, introducing multiple switch-ports will need Guest driver change as well. The existing sunvnet driver will not automatically utilize all switch-ports and requires changes. When we add the full support to use multiple switch-ports, I am sure we can change the current optimization for switch-port lookup with a different method probably with another optimized method than what it is today. -Raghuram-- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c index e2aacf5..dca4397 100644 --- a/drivers/net/ethernet/sun/sunvnet.c +++ b/drivers/net/ethernet/sun/sunvnet.c @@ -47,6 +47,42 @@ MODULE_VERSION(DRV_MODULE_VERSION); static int __vnet_tx_trigger(struct vnet_port *port, u32 start); +static inline int vnet_refcnt_read(const struct vnet_port *port) +{ + return atomic_read(&port->refcnt); +} + +static inline void vnet_hold(struct vnet_port *port) +{ + atomic_inc(&port->refcnt); + BUG_ON(vnet_refcnt_read(port) == 0); +} + +static void vnet_put(struct vnet_port *port) +{ + atomic_dec(&port->refcnt); +} + +static void vnet_wait_allrefs(struct vnet_port *port) +{ + unsigned long warning_time; + int refcnt = vnet_refcnt_read(port); + + warning_time = jiffies; + while (refcnt != 0) { + msleep(250); + refcnt = vnet_refcnt_read(port); + if (time_after(jiffies, warning_time + 10 * HZ)) { + pr_emerg("vnet_wait_allrefs: waiting for port to " + "%x:%x:%x:%x:%x:%x. Usage count = %d\n", + port->raddr[0], port->raddr[1], + port->raddr[2], port->raddr[3], + port->raddr[4], port->raddr[5], refcnt); + warning_time = jiffies; + } + } +} + /* Ordered from largest major to lowest */ static struct vio_version vnet_versions[] = { { .major = 1, .minor = 6 }, @@ -773,14 +809,14 @@ struct vnet_port *__tx_port_find(struct vnet *vp, struct sk_buff *skb) hlist_for_each_entry(port, hp, hash) { if (!port_is_up(port)) continue; - if (ether_addr_equal(port->raddr, skb->data)) + if (ether_addr_equal(port->raddr, skb->data)) { + vnet_hold(port); return port; + } } - list_for_each_entry(port, &vp->port_list, list) { - if (!port->switch_port) - continue; - if (!port_is_up(port)) - continue; + port = vp->switch_port; + if (port != NULL && port_is_up(port)) { + vnet_hold(port); return port; } return NULL; @@ -974,6 +1010,7 @@ static int vnet_start_xmit(struct sk_buff *skb, struct net_device *dev) dev->stats.tx_errors++; } spin_unlock_irqrestore(&port->vio.lock, flags); + vnet_put(port); return NETDEV_TX_BUSY; } @@ -1071,6 +1108,7 @@ ldc_start_done: vnet_free_skbs(freeskbs); (void)mod_timer(&port->clean_timer, jiffies + VNET_CLEAN_TIMEOUT); + vnet_put(port); return NETDEV_TX_OK; @@ -1087,6 +1125,8 @@ out_dropped: else del_timer(&port->clean_timer); dev->stats.tx_dropped++; + if (port) + vnet_put(port); return NETDEV_TX_OK; } @@ -1581,10 +1621,12 @@ static int vnet_port_probe(struct vio_dev *vdev, const struct vio_device_id *id) port->switch_port = switch_port; spin_lock_irqsave(&vp->lock, flags); - if (switch_port) + if (switch_port) { + vp->switch_port = port; list_add(&port->list, &vp->port_list); - else + } else { list_add_tail(&port->list, &vp->port_list); + } hlist_add_head(&port->hash, &vp->port_hash[vnet_hashfn(port->raddr)]); spin_unlock_irqrestore(&vp->lock, flags); @@ -1631,11 +1673,14 @@ static int vnet_port_remove(struct vio_dev *vdev) */ spin_lock_irqsave(&vp->lock, flags); port->flags |= VNET_PORT_DEAD; + if (port->switch_port) + vp->switch_port = NULL; list_del(&port->list); hlist_del(&port->hash); spin_unlock_irqrestore(&vp->lock, flags); vnet_workq_disable(port); + vnet_wait_allrefs(port); vnet_port_free_tx_bufs(port); vio_ldc_free(&port->vio); diff --git a/drivers/net/ethernet/sun/sunvnet.h b/drivers/net/ethernet/sun/sunvnet.h index 1182ec6..6f62ea5 100644 --- a/drivers/net/ethernet/sun/sunvnet.h +++ b/drivers/net/ethernet/sun/sunvnet.h @@ -53,6 +53,7 @@ struct vnet_port { u32 stop_rx_idx; bool stop_rx; bool start_cons; + atomic_t refcnt; struct timer_list clean_timer; @@ -104,6 +105,7 @@ struct vnet { u64 local_mac; struct tasklet_struct vnet_tx_wakeup; + struct vnet_port *switch_port; }; #endif /* _SUNVNET_H */