diff mbox

[2/2] ixgbe, don't assume mapping of numa node cpus

Message ID 1393267913-28212-3-git-send-email-prarit@redhat.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Prarit Bhargava Feb. 24, 2014, 6:51 p.m. UTC
The ixgbe driver assumes that the cpus on a node are mapped 1:1 with the
indexes into arrays.  This is not the case as nodes can contain, for
example, cpus 0-7, 33-40.

This patch fixes this problem.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
Cc: Don Skidmore <donald.c.skidmore@intel.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: Alex Duyck <alexander.h.duyck@intel.com>
Cc: John Ronciak <john.ronciak@intel.com>
Cc: Mitch Williams <mitch.a.williams@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: nhorman@redhat.com
Cc: agospoda@redhat.com
Cc: e1000-devel@lists.sourceforge.net
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c |   16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

Comments

Duyck, Alexander H Feb. 24, 2014, 7:39 p.m. UTC | #1
On 02/24/2014 10:51 AM, Prarit Bhargava wrote:
> The ixgbe driver assumes that the cpus on a node are mapped 1:1 with the
> indexes into arrays.  This is not the case as nodes can contain, for
> example, cpus 0-7, 33-40.
> 
> This patch fixes this problem.
> 
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Cc: Bruce Allan <bruce.w.allan@intel.com>
> Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
> Cc: Don Skidmore <donald.c.skidmore@intel.com>
> Cc: Greg Rose <gregory.v.rose@intel.com>
> Cc: Alex Duyck <alexander.h.duyck@intel.com>
> Cc: John Ronciak <john.ronciak@intel.com>
> Cc: Mitch Williams <mitch.a.williams@intel.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: nhorman@redhat.com
> Cc: agospoda@redhat.com
> Cc: e1000-devel@lists.sourceforge.net
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c |   16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> index 3668288..8b3992e 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> @@ -794,11 +794,15 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
>  {
>  	struct ixgbe_q_vector *q_vector;
>  	struct ixgbe_ring *ring;
> -	int node = NUMA_NO_NODE;
> -	int cpu = -1;
> +	int node = adapter->pdev->dev.numa_node;
> +	int cpu, set_affinity = 0;
>  	int ring_count, size;
>  	u8 tcs = netdev_get_num_tc(adapter->netdev);
>  
> +	if (node == NUMA_NO_NODE)
> +		cpu = -1;
> +	else
> +		cpu = cpumask_next(v_idx - 1, cpumask_of_node(node));
>  	ring_count = txr_count + rxr_count;
>  	size = sizeof(struct ixgbe_q_vector) +
>  	       (sizeof(struct ixgbe_ring) * ring_count);

Are you sure this does what you think it does?  I thought the first
value is just the starting offset to check for a bit?  I don't think
cpumask_next is aware of holes in a given mask.  So for example if 8-31
are missing I think you will end up with all of your CPUs being
allocated to CPU 32 since it is the first set bit greater than 15.

What might work better here is a function that returns the local node
CPU IDs first, followed by the remote node IDs if ATR is enabled.  We
should probably have it configured to loop in the case that the number
of queues is greater than local nodes, but ATR is not enabled.

> @@ -807,10 +811,8 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
>  	if ((tcs <= 1) && !(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)) {
>  		u16 rss_i = adapter->ring_feature[RING_F_RSS].indices;
>  		if (rss_i > 1 && adapter->atr_sample_rate) {
> -			if (cpu_online(v_idx)) {
> -				cpu = v_idx;
> -				node = cpu_to_node(cpu);
> -			}
> +			if (likely(cpu_online(cpu)))
> +				set_affinity = 1;
>  		}
>  	}
>  

The node assignment is still needed here.  We need to be able to assign
queues to remote nodes as applications will be there expecting data.  We
have seen a serious performance degradation when trying to feed an
application from a remote queue even if the queue is local to hardware.

> @@ -822,7 +824,7 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
>  		return -ENOMEM;
>  
>  	/* setup affinity mask and node */
> -	if (cpu != -1)
> +	if (set_affinity)
>  		cpumask_set_cpu(cpu, &q_vector->affinity_mask);
>  	q_vector->numa_node = node;
>  
> 

I'm not sure what the point of this change is other than the fact that
you changed the cpu configuration to be earlier.  The affinity mask
could be configured with an offline CPU and it should have no negative
affect.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amir Vadai Feb. 25, 2014, 5:27 p.m. UTC | #2
On 24/02/14 13:51 -0500, Prarit Bhargava wrote:
> The ixgbe driver assumes that the cpus on a node are mapped 1:1 with the
> indexes into arrays.  This is not the case as nodes can contain, for
> example, cpus 0-7, 33-40.
> 
> This patch fixes this problem.
> 
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Cc: Bruce Allan <bruce.w.allan@intel.com>
> Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
> Cc: Don Skidmore <donald.c.skidmore@intel.com>
> Cc: Greg Rose <gregory.v.rose@intel.com>
> Cc: Alex Duyck <alexander.h.duyck@intel.com>
> Cc: John Ronciak <john.ronciak@intel.com>
> Cc: Mitch Williams <mitch.a.williams@intel.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: nhorman@redhat.com
> Cc: agospoda@redhat.com
> Cc: e1000-devel@lists.sourceforge.net
> ---

Hi,

I'm just about to send tomorrow a patch to add an helper function to
get affinity_hint suggestion by numa_node and ring index.
If you'd like you will be able to use it too here.

We're still doing internal review on it before sending to the mailing
list, but this will be the declaration of the function:
/*
 * netif_set_rx_queue_affinity_hint - set affinity hint of rx queue
 * @rxq: index of rx queue
 * @numa_node: prefered numa_node
 * @affinity_mask: the relevant cpu bit is set according to the policy
 *
 * This function sets the affinity_mask according to a numa aware policy.
 * affinity_mask coulbe used as an affinity hint for the IRQ related of this
 * rx queue.
 * The policy is to spread rx queues across cores - local cores first.
 *
 * Returns 0 on success, or a negative error code.
 */
int netif_set_rx_queue_affinity_hint(int rxq, int numa_node,
                                     cpumask_var_t affinity_mask);



Amir
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava Feb. 25, 2014, 5:43 p.m. UTC | #3
On 02/25/2014 12:27 PM, Amir Vadai wrote:
> On 24/02/14 13:51 -0500, Prarit Bhargava wrote:
>> The ixgbe driver assumes that the cpus on a node are mapped 1:1 with the
>> indexes into arrays.  This is not the case as nodes can contain, for
>> example, cpus 0-7, 33-40.
>>
>> This patch fixes this problem.
>>
>> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
>> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
>> Cc: Bruce Allan <bruce.w.allan@intel.com>
>> Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
>> Cc: Don Skidmore <donald.c.skidmore@intel.com>
>> Cc: Greg Rose <gregory.v.rose@intel.com>
>> Cc: Alex Duyck <alexander.h.duyck@intel.com>
>> Cc: John Ronciak <john.ronciak@intel.com>
>> Cc: Mitch Williams <mitch.a.williams@intel.com>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: nhorman@redhat.com
>> Cc: agospoda@redhat.com
>> Cc: e1000-devel@lists.sourceforge.net
>> ---
> 
> Hi,
> 
> I'm just about to send tomorrow a patch to add an helper function to
> get affinity_hint suggestion by numa_node and ring index.
> If you'd like you will be able to use it too here.
> 
> We're still doing internal review on it before sending to the mailing
> list, but this will be the declaration of the function:
> /*
>  * netif_set_rx_queue_affinity_hint - set affinity hint of rx queue
>  * @rxq: index of rx queue
>  * @numa_node: prefered numa_node
>  * @affinity_mask: the relevant cpu bit is set according to the policy
>  *
>  * This function sets the affinity_mask according to a numa aware policy.
>  * affinity_mask coulbe used as an affinity hint for the IRQ related of this
>  * rx queue.
>  * The policy is to spread rx queues across cores - local cores first.
>  *
>  * Returns 0 on success, or a negative error code.
>  */
> int netif_set_rx_queue_affinity_hint(int rxq, int numa_node,
>                                      cpumask_var_t affinity_mask);

I'm going to wait for this patch then.  Amir, please cc me.

Thanks for the info,

P.

> 
> 
> 
> Amir
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index 3668288..8b3992e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -794,11 +794,15 @@  static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
 {
 	struct ixgbe_q_vector *q_vector;
 	struct ixgbe_ring *ring;
-	int node = NUMA_NO_NODE;
-	int cpu = -1;
+	int node = adapter->pdev->dev.numa_node;
+	int cpu, set_affinity = 0;
 	int ring_count, size;
 	u8 tcs = netdev_get_num_tc(adapter->netdev);
 
+	if (node == NUMA_NO_NODE)
+		cpu = -1;
+	else
+		cpu = cpumask_next(v_idx - 1, cpumask_of_node(node));
 	ring_count = txr_count + rxr_count;
 	size = sizeof(struct ixgbe_q_vector) +
 	       (sizeof(struct ixgbe_ring) * ring_count);
@@ -807,10 +811,8 @@  static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
 	if ((tcs <= 1) && !(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)) {
 		u16 rss_i = adapter->ring_feature[RING_F_RSS].indices;
 		if (rss_i > 1 && adapter->atr_sample_rate) {
-			if (cpu_online(v_idx)) {
-				cpu = v_idx;
-				node = cpu_to_node(cpu);
-			}
+			if (likely(cpu_online(cpu)))
+				set_affinity = 1;
 		}
 	}
 
@@ -822,7 +824,7 @@  static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
 		return -ENOMEM;
 
 	/* setup affinity mask and node */
-	if (cpu != -1)
+	if (set_affinity)
 		cpumask_set_cpu(cpu, &q_vector->affinity_mask);
 	q_vector->numa_node = node;