diff mbox

[net-next,V6,2/2] net/mlx4_en: Use affinity hint

Message ID 1401029247-15196-3-git-send-email-amirv@mellanox.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Amir Vadai May 25, 2014, 2:47 p.m. UTC
From: Yuval Atias <yuvala@mellanox.com>

The “affinity hint” mechanism is used by the user space
daemon, irqbalancer, to indicate a preferred CPU mask for irqs.
Irqbalancer can use this hint to balance the irqs between the
cpus indicated by the mask.

We wish the HCA to preferentially map the IRQs it uses to numa cores
close to it.  To accomplish this, we use cpumask_set_cpu_local_first(), that
sets the affinity hint according the following policy:
First it maps IRQs to “close” numa cores.  If these are exhausted, the
remaining IRQs are mapped to “far” numa cores.

Signed-off-by: Yuval Atias <yuvala@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/infiniband/hw/mlx4/main.c              |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_cq.c     |  6 +++++-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 30 ++++++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx4/eq.c        | 13 ++++++++++-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  1 +
 include/linux/mlx4/device.h                    |  2 +-
 6 files changed, 50 insertions(+), 4 deletions(-)

Comments

Eric Dumazet June 2, 2014, 4:16 a.m. UTC | #1
On Sun, 2014-05-25 at 17:47 +0300, Amir Vadai wrote:
> From: Yuval Atias <yuvala@mellanox.com>
> 
> The “affinity hint” mechanism is used by the user space
> daemon, irqbalancer, to indicate a preferred CPU mask for irqs.
> Irqbalancer can use this hint to balance the irqs between the
> cpus indicated by the mask.
> 
> We wish the HCA to preferentially map the IRQs it uses to numa cores
> close to it.  To accomplish this, we use cpumask_set_cpu_local_first(), that
> sets the affinity hint according the following policy:
> First it maps IRQs to “close” numa cores.  If these are exhausted, the
> remaining IRQs are mapped to “far” numa cores.
> 
> Signed-off-by: Yuval Atias <yuvala@mellanox.com>
> Signed-off-by: Amir Vadai <amirv@mellanox.com>
> ---

  CC [M]  drivers/net/ethernet/mellanox/mlx4/en_netdev.o
drivers/net/ethernet/mellanox/mlx4/en_netdev.c: In function ‘mlx4_en_init_affinity_hint’:
drivers/net/ethernet/mellanox/mlx4/en_netdev.c:1546:23: error: incompatible types when assigning to type ‘cpumask_var_t’ from type ‘void *’
drivers/net/ethernet/mellanox/mlx4/en_netdev.c: In function ‘mlx4_en_free_affinity_hint’:
drivers/net/ethernet/mellanox/mlx4/en_netdev.c:1553:41: error: incompatible types when assigning to type ‘cpumask_var_t’ from type ‘void *’


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet June 2, 2014, 4:37 a.m. UTC | #2
On Sun, 2014-06-01 at 21:16 -0700, Eric Dumazet wrote:
> On Sun, 2014-05-25 at 17:47 +0300, Amir Vadai wrote:
> > From: Yuval Atias <yuvala@mellanox.com>
> > 
> > The “affinity hint” mechanism is used by the user space
> > daemon, irqbalancer, to indicate a preferred CPU mask for irqs.
> > Irqbalancer can use this hint to balance the irqs between the
> > cpus indicated by the mask.
> > 
> > We wish the HCA to preferentially map the IRQs it uses to numa cores
> > close to it.  To accomplish this, we use cpumask_set_cpu_local_first(), that
> > sets the affinity hint according the following policy:
> > First it maps IRQs to “close” numa cores.  If these are exhausted, the
> > remaining IRQs are mapped to “far” numa cores.
> > 
> > Signed-off-by: Yuval Atias <yuvala@mellanox.com>
> > Signed-off-by: Amir Vadai <amirv@mellanox.com>
> > ---
> 
>   CC [M]  drivers/net/ethernet/mellanox/mlx4/en_netdev.o
> drivers/net/ethernet/mellanox/mlx4/en_netdev.c: In function ‘mlx4_en_init_affinity_hint’:
> drivers/net/ethernet/mellanox/mlx4/en_netdev.c:1546:23: error: incompatible types when assigning to type ‘cpumask_var_t’ from type ‘void *’
> drivers/net/ethernet/mellanox/mlx4/en_netdev.c: In function ‘mlx4_en_free_affinity_hint’:
> drivers/net/ethernet/mellanox/mlx4/en_netdev.c:1553:41: error: incompatible types when assigning to type ‘cpumask_var_t’ from type ‘void *’


And :

ERROR: "cpumask_set_cpu_local_first" [drivers/net/ethernet/mellanox/mlx4/mlx4_en.ko] undefined!


$ git grep -n cpumask_set_cpu_local_first
drivers/net/ethernet/mellanox/mlx4/en_netdev.c:1542:    if (cpumask_set_cpu_local_first(ring_idx, numa_node,
include/linux/cpumask.h:260:int cpumask_set_cpu_local_first(int i, int numa_node, cpumask_t *dstp);
lib/cpumask.c:168: * cpumask_set_cpu_local_first - set i'th cpu with local numa cpu's first
lib/cpumask.c:182:int cpumask_set_cpu_local_first(int i, int numa_node, cpumask_t *dstp)
lib/cpumask.c:228:EXPORT_SYMBOL(cpumask_set_cpu_local_first);

Fixes are needed if CONFIG_CPUMASK_OFFSTACK is not used.
	
$ grep CONFIG_CPUMASK_OFFSTACK .config
$ echo $?
1


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller June 2, 2014, 4:54 a.m. UTC | #3
From: Eric Dumazet <eric.dumazet@gmail.com>

Date: Sun, 01 Jun 2014 21:16:50 -0700

>   CC [M]  drivers/net/ethernet/mellanox/mlx4/en_netdev.o

> drivers/net/ethernet/mellanox/mlx4/en_netdev.c: In function ‘mlx4_en_init_affinity_hint’:

> drivers/net/ethernet/mellanox/mlx4/en_netdev.c:1546:23: error: incompatible types when assigning to type ‘cpumask_var_t’ from type ‘void *’

> drivers/net/ethernet/mellanox/mlx4/en_netdev.c: In function ‘mlx4_en_free_affinity_hint’:

> drivers/net/ethernet/mellanox/mlx4/en_netdev.c:1553:41: error: incompatible types when assigning to type ‘cpumask_var_t’ from type ‘void *’


What configuration/compiler combination generates this warning?  I didn't
see it with allmodconfig.
Eric Dumazet June 2, 2014, 5:13 a.m. UTC | #4
On Sun, 2014-06-01 at 21:56 -0700, David Miller wrote:

> Indeed you have to provide a dummy version for a non-SMP build etc.
> 
> I'm reverting.
> 

Hi David. I think your revert took one wrong commit.


# git show ee39facbf82e73e468c504d2b40e83e2d223c28c | diffstat -p1 -w70
 drivers/net/ethernet/micrel/ks8851.c |   50 ++++++++++---------
 include/linux/cpumask.h              |    2 
 lib/cpumask.c                        |   64 -------------------------
 3 files changed, 28 insertions(+), 88 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller June 2, 2014, 7:10 a.m. UTC | #5
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 01 Jun 2014 22:13:12 -0700

> On Sun, 2014-06-01 at 21:56 -0700, David Miller wrote:
> 
>> Indeed you have to provide a dummy version for a non-SMP build etc.
>> 
>> I'm reverting.
>> 
> 
> Hi David. I think your revert took one wrong commit.

Thanks I'll fix it up.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amir Vadai June 2, 2014, 2:08 p.m. UTC | #6
On 6/2/2014 8:13 AM, Eric Dumazet wrote:
> On Sun, 2014-06-01 at 21:56 -0700, David Miller wrote:
>
>> Indeed you have to provide a dummy version for a non-SMP build etc.
>>
>> I'm reverting.
>>
>
> Hi David. I think your revert took one wrong commit.
>
>
> # git show ee39facbf82e73e468c504d2b40e83e2d223c28c | diffstat -p1 -w70
>   drivers/net/ethernet/micrel/ks8851.c |   50 ++++++++++---------
>   include/linux/cpumask.h              |    2
>   lib/cpumask.c                        |   64 -------------------------
>   3 files changed, 28 insertions(+), 88 deletions(-)
>
>
>

Hi,

Yeh, Eric is right and it seems that 2a82e40 "net: ks8851: Don't use 
regulator_get_optional()" was reverted by mistake instead of 70a640d: 
"net/mlx4_en: Use affinity hint"

I'm working on a fixed version of the affinity patches - this time I 
will double check the CONFIG_SMP/CONFIG_CPUMASK_OFFSTACK combinations.

I'm preparing a public git with Mellanox updates, so that Mellanox 
drivers patches will pass 0-DAY kernel build testing, before landing in 
net-next.

Amir
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index a9638ae..8c88960 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1837,7 +1837,7 @@  static void mlx4_ib_alloc_eqs(struct mlx4_dev *dev, struct mlx4_ib_dev *ibdev)
 				 i, j, dev->pdev->bus->name);
 			/* Set IRQ for specific name (per ring) */
 			if (mlx4_assign_eq(dev, name, NULL,
-					   &ibdev->eq_table[eq])) {
+					   &ibdev->eq_table[eq], NULL)) {
 				/* Use legacy (same as mlx4_en driver) */
 				pr_warn("Can't allocate EQ %d; reverting to legacy\n", eq);
 				ibdev->eq_table[eq] =
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
index 636963d..ea2cd72 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
@@ -118,11 +118,15 @@  int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
 	if (cq->is_tx == RX) {
 		if (mdev->dev->caps.comp_pool) {
 			if (!cq->vector) {
+				struct mlx4_en_rx_ring *ring =
+					priv->rx_ring[cq->ring];
+
 				sprintf(name, "%s-%d", priv->dev->name,
 					cq->ring);
 				/* Set IRQ for specific name (per ring) */
 				if (mlx4_assign_eq(mdev->dev, name, rmap,
-						   &cq->vector)) {
+						   &cq->vector,
+						   ring->affinity_mask)) {
 					cq->vector = (cq->ring + 1 + priv->port)
 					    % mdev->dev->caps.num_comp_vectors;
 					mlx4_warn(mdev, "Failed assigning an EQ to %s, falling back to legacy EQ's\n",
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 5bb7eda..826d150 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1531,6 +1531,32 @@  static void mlx4_en_linkstate(struct work_struct *work)
 	mutex_unlock(&mdev->state_lock);
 }
 
+static void mlx4_en_init_affinity_hint(struct mlx4_en_priv *priv, int ring_idx)
+{
+	struct mlx4_en_rx_ring *ring = priv->rx_ring[ring_idx];
+	int numa_node = priv->mdev->dev->numa_node;
+
+	if (numa_node == -1)
+		return;
+
+	if (!zalloc_cpumask_var(&ring->affinity_mask, GFP_KERNEL)) {
+		en_err(priv, "Failed to allocate core mask\n");
+		return;
+	}
+
+	if (cpumask_set_cpu_local_first(ring_idx, numa_node,
+					ring->affinity_mask)) {
+		en_err(priv, "Failed setting affinity hint\n");
+		free_cpumask_var(ring->affinity_mask);
+		ring->affinity_mask = NULL;
+	}
+}
+
+static void mlx4_en_free_affinity_hint(struct mlx4_en_priv *priv, int ring_idx)
+{
+	free_cpumask_var(priv->rx_ring[ring_idx]->affinity_mask);
+	priv->rx_ring[ring_idx]->affinity_mask = NULL;
+}
 
 int mlx4_en_start_port(struct net_device *dev)
 {
@@ -1572,6 +1598,8 @@  int mlx4_en_start_port(struct net_device *dev)
 
 		mlx4_en_cq_init_lock(cq);
 
+		mlx4_en_init_affinity_hint(priv, i);
+
 		err = mlx4_en_activate_cq(priv, cq, i);
 		if (err) {
 			en_err(priv, "Failed activating Rx CQ\n");
@@ -1852,6 +1880,8 @@  void mlx4_en_stop_port(struct net_device *dev, int detach)
 			msleep(1);
 		mlx4_en_deactivate_rx_ring(priv, priv->rx_ring[i]);
 		mlx4_en_deactivate_cq(priv, cq);
+
+		mlx4_en_free_affinity_hint(priv, i);
 	}
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c b/drivers/net/ethernet/mellanox/mlx4/eq.c
index 947364d..02cf97d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
@@ -1376,7 +1376,7 @@  int mlx4_test_interrupts(struct mlx4_dev *dev)
 EXPORT_SYMBOL(mlx4_test_interrupts);
 
 int mlx4_assign_eq(struct mlx4_dev *dev, char *name, struct cpu_rmap *rmap,
-		   int *vector)
+		   int *vector, cpumask_var_t cpu_hint_mask)
 {
 
 	struct mlx4_priv *priv = mlx4_priv(dev);
@@ -1411,6 +1411,15 @@  int mlx4_assign_eq(struct mlx4_dev *dev, char *name, struct cpu_rmap *rmap,
 			}
 			mlx4_assign_irq_notifier(priv, dev,
 						 priv->eq_table.eq[vec].irq);
+			if (cpu_hint_mask) {
+				err = irq_set_affinity_hint(
+						priv->eq_table.eq[vec].irq,
+						cpu_hint_mask);
+				if (err) {
+					mlx4_warn(dev, "Failed setting affinity hint\n");
+					/*we dont want to break here*/
+				}
+			}
 
 			eq_set_ci(&priv->eq_table.eq[vec], 1);
 		}
@@ -1441,6 +1450,8 @@  void mlx4_release_eq(struct mlx4_dev *dev, int vec)
 			irq_set_affinity_notifier(
 				priv->eq_table.eq[vec].irq,
 				NULL);
+			irq_set_affinity_hint(priv->eq_table.eq[vec].irq,
+					      NULL);
 			free_irq(priv->eq_table.eq[vec].irq,
 				 &priv->eq_table.eq[vec]);
 			priv->msix_ctl.pool_bm &= ~(1ULL << i);
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 88d5cf6..61d7c36 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -313,6 +313,7 @@  struct mlx4_en_rx_ring {
 	unsigned long csum_ok;
 	unsigned long csum_none;
 	int hwtstamp_rx_filter;
+	cpumask_var_t affinity_mask;
 };
 
 struct mlx4_en_cq {
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 74f5aa8..8b194aa 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -1161,7 +1161,7 @@  int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr);
 int mlx4_SYNC_TPT(struct mlx4_dev *dev);
 int mlx4_test_interrupts(struct mlx4_dev *dev);
 int mlx4_assign_eq(struct mlx4_dev *dev, char *name, struct cpu_rmap *rmap,
-		   int *vector);
+		   int *vector, cpumask_t *cpu_hint_mask);
 void mlx4_release_eq(struct mlx4_dev *dev, int vec);
 
 int mlx4_get_phys_port_id(struct mlx4_dev *dev);