diff mbox

af_packet: Add Queue mapping mode to af_packet fanout operation

Message ID 1390424504-18543-1-git-send-email-nhorman@tuxdriver.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Neil Horman Jan. 22, 2014, 9:01 p.m. UTC
This patch adds a queue mapping mode to the fanout operation of af_packet
sockets.  This allows user space af_packet users to better filter on flows
ingressing and egressing via a specific hardware queue, and avoids the potential
packet reordering that can occur when FANOUT_CPU is being used and irq affinity
varies.

Tested successfully by myself.  applies to net-next

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
---
 include/uapi/linux/if_packet.h |  1 +
 net/packet/af_packet.c         | 11 +++++++++++
 2 files changed, 12 insertions(+)

Comments

Daniel Borkmann Jan. 22, 2014, 10 p.m. UTC | #1
On 01/22/2014 10:01 PM, Neil Horman wrote:
> This patch adds a queue mapping mode to the fanout operation of af_packet
> sockets.  This allows user space af_packet users to better filter on flows
> ingressing and egressing via a specific hardware queue, and avoids the potential

Maybe I'm missing something, but I currently cannot find where this is
being filled out for ingress path? Egress, ok, this gets filled out
somewhere in protocol layers or elsewhere and is being locally pushed
back through dev_queue_xmit_nit(), but I think main use case is ingress
through packet fanout. In driver layer I can find skb->rxhash filled out
which would then be PACKET_FANOUT_HASH.
(Otherwise patch looks good.)

> packet reordering that can occur when FANOUT_CPU is being used and irq affinity
> varies.
>
> Tested successfully by myself.  applies to net-next
>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: "David S. Miller" <davem@davemloft.net>
> ---
>   include/uapi/linux/if_packet.h |  1 +
>   net/packet/af_packet.c         | 11 +++++++++++
>   2 files changed, 12 insertions(+)
>
> diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
> index 1988a02..bac27fa 100644
> --- a/include/uapi/linux/if_packet.h
> +++ b/include/uapi/linux/if_packet.h
> @@ -60,6 +60,7 @@ struct sockaddr_ll {
>   #define PACKET_FANOUT_CPU		2
>   #define PACKET_FANOUT_ROLLOVER		3
>   #define PACKET_FANOUT_RND		4
> +#define PACKET_FANOUT_QM		5
>   #define PACKET_FANOUT_FLAG_ROLLOVER	0x1000
>   #define PACKET_FANOUT_FLAG_DEFRAG	0x8000
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index d711ecb..bd90a87 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1313,6 +1313,13 @@ static unsigned int fanout_demux_rollover(struct packet_fanout *f,
>   	return idx;
>   }
>
> +static unsigned int fanout_demux_qm(struct packet_fanout *f,
> +				    struct sk_buff *skb,
> +				    unsigned int num)
> +{
> +	return skb_get_queue_mapping(skb) % num;
> +}
> +
>   static bool fanout_has_flag(struct packet_fanout *f, u16 flag)
>   {
>   	return f->flags & (flag >> 8);
> @@ -1352,6 +1359,9 @@ static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev,
>   	case PACKET_FANOUT_RND:
>   		idx = fanout_demux_rnd(f, skb, num);
>   		break;
> +	case PACKET_FANOUT_QM:
> +		idx = fanout_demux_qm(f, skb, num);
> +		break;
>   	case PACKET_FANOUT_ROLLOVER:
>   		idx = fanout_demux_rollover(f, skb, 0, (unsigned int) -1, num);
>   		break;
> @@ -1422,6 +1432,7 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags)
>   	case PACKET_FANOUT_LB:
>   	case PACKET_FANOUT_CPU:
>   	case PACKET_FANOUT_RND:
> +	case PACKET_FANOUT_QM:
>   		break;
>   	default:
>   		return -EINVAL;
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 23, 2014, 1:08 a.m. UTC | #2
On Wed, 2014-01-22 at 23:00 +0100, Daniel Borkmann wrote:
> On 01/22/2014 10:01 PM, Neil Horman wrote:
> > This patch adds a queue mapping mode to the fanout operation of af_packet
> > sockets.  This allows user space af_packet users to better filter on flows
> > ingressing and egressing via a specific hardware queue, and avoids the potential
> 
> Maybe I'm missing something, but I currently cannot find where this is
> being filled out for ingress path? Egress, ok, this gets filled out
> somewhere in protocol layers or elsewhere and is being locally pushed
> back through dev_queue_xmit_nit(), but I think main use case is ingress
> through packet fanout. In driver layer I can find skb->rxhash filled out
> which would then be PACKET_FANOUT_HASH.
> (Otherwise patch looks good.)


Check for various multiqueue drivers calling skb_record_rx_queue()



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 23, 2014, 1:37 a.m. UTC | #3
From: Neil Horman <nhorman@tuxdriver.com>
Date: Wed, 22 Jan 2014 16:01:44 -0500

> This patch adds a queue mapping mode to the fanout operation of af_packet
> sockets.  This allows user space af_packet users to better filter on flows
> ingressing and egressing via a specific hardware queue, and avoids the potential
> packet reordering that can occur when FANOUT_CPU is being used and irq affinity
> varies.
> 
> Tested successfully by myself.  applies to net-next
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

This looks fine, applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann Jan. 23, 2014, 7:56 a.m. UTC | #4
n 01/23/2014 02:08 AM, Eric Dumazet wrote:
> On Wed, 2014-01-22 at 23:00 +0100, Daniel Borkmann wrote:
>> On 01/22/2014 10:01 PM, Neil Horman wrote:
>>> This patch adds a queue mapping mode to the fanout operation of af_packet
>>> sockets.  This allows user space af_packet users to better filter on flows
>>> ingressing and egressing via a specific hardware queue, and avoids the potential
>>
>> Maybe I'm missing something, but I currently cannot find where this is
>> being filled out for ingress path? Egress, ok, this gets filled out
>> somewhere in protocol layers or elsewhere and is being locally pushed
>> back through dev_queue_xmit_nit(), but I think main use case is ingress
>> through packet fanout. In driver layer I can find skb->rxhash filled out
>> which would then be PACKET_FANOUT_HASH.
>> (Otherwise patch looks good.)
>
> Check for various multiqueue drivers calling skb_record_rx_queue()

Okay, thanks for the clarification Eric, patch looks good then.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
index 1988a02..bac27fa 100644
--- a/include/uapi/linux/if_packet.h
+++ b/include/uapi/linux/if_packet.h
@@ -60,6 +60,7 @@  struct sockaddr_ll {
 #define PACKET_FANOUT_CPU		2
 #define PACKET_FANOUT_ROLLOVER		3
 #define PACKET_FANOUT_RND		4
+#define PACKET_FANOUT_QM		5
 #define PACKET_FANOUT_FLAG_ROLLOVER	0x1000
 #define PACKET_FANOUT_FLAG_DEFRAG	0x8000
 
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index d711ecb..bd90a87 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1313,6 +1313,13 @@  static unsigned int fanout_demux_rollover(struct packet_fanout *f,
 	return idx;
 }
 
+static unsigned int fanout_demux_qm(struct packet_fanout *f,
+				    struct sk_buff *skb,
+				    unsigned int num)
+{
+	return skb_get_queue_mapping(skb) % num;
+}
+
 static bool fanout_has_flag(struct packet_fanout *f, u16 flag)
 {
 	return f->flags & (flag >> 8);
@@ -1352,6 +1359,9 @@  static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev,
 	case PACKET_FANOUT_RND:
 		idx = fanout_demux_rnd(f, skb, num);
 		break;
+	case PACKET_FANOUT_QM:
+		idx = fanout_demux_qm(f, skb, num);
+		break;
 	case PACKET_FANOUT_ROLLOVER:
 		idx = fanout_demux_rollover(f, skb, 0, (unsigned int) -1, num);
 		break;
@@ -1422,6 +1432,7 @@  static int fanout_add(struct sock *sk, u16 id, u16 type_flags)
 	case PACKET_FANOUT_LB:
 	case PACKET_FANOUT_CPU:
 	case PACKET_FANOUT_RND:
+	case PACKET_FANOUT_QM:
 		break;
 	default:
 		return -EINVAL;