diff mbox

[net-next,2/4] mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs

Message ID 1480721013-1047541-3-git-send-email-kafai@fb.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Martin KaFai Lau Dec. 2, 2016, 11:23 p.m. UTC
When XDP prog is attached, it is currently limiting
MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN) which is 1514
in x86.

AFAICT, since mlx4 is doing one page per packet for XDP,
we can at least raise the MTU limitation up to
PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this patch is
doing.  It will be useful in the next patch which allows
XDP program to extend the packet by adding new header(s).

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 28 +++++++++++-----
 drivers/net/ethernet/mellanox/mlx4/en_rx.c     | 46 ++++++++++++++------------
 2 files changed, 44 insertions(+), 30 deletions(-)

Comments

Rick Jones Dec. 3, 2016, 12:07 a.m. UTC | #1
On 12/02/2016 03:23 PM, Martin KaFai Lau wrote:
> When XDP prog is attached, it is currently limiting
> MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN) which is 1514
> in x86.
>
> AFAICT, since mlx4 is doing one page per packet for XDP,
> we can at least raise the MTU limitation up to
> PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this patch is
> doing.  It will be useful in the next patch which allows
> XDP program to extend the packet by adding new header(s).

Is mlx4 the only driver doing page-per-packet?

rick jones
Eric Dumazet Dec. 3, 2016, 12:38 a.m. UTC | #2
On Fri, 2016-12-02 at 15:23 -0800, Martin KaFai Lau wrote:
> When XDP prog is attached, it is currently limiting
> MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN) which is 1514
> in x86.
> 
> AFAICT, since mlx4 is doing one page per packet for XDP,
> we can at least raise the MTU limitation up to
> PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this patch is
> doing.  It will be useful in the next patch which allows
> XDP program to extend the packet by adding new header(s).
> 
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---

Have you tested your patch on a host with PAGE_SIZE = 64 KB ?

Looks XDP really kills arches with bigger pages :(

Thanks.
Alexei Starovoitov Dec. 3, 2016, 12:53 a.m. UTC | #3
On 12/2/16 4:38 PM, Eric Dumazet wrote:
> On Fri, 2016-12-02 at 15:23 -0800, Martin KaFai Lau wrote:
>> When XDP prog is attached, it is currently limiting
>> MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN) which is 1514
>> in x86.
>>
>> AFAICT, since mlx4 is doing one page per packet for XDP,
>> we can at least raise the MTU limitation up to
>> PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this patch is
>> doing.  It will be useful in the next patch which allows
>> XDP program to extend the packet by adding new header(s).
>>
>> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
>> ---
>
> Have you tested your patch on a host with PAGE_SIZE = 64 KB ?
>
> Looks XDP really kills arches with bigger pages :(

I'm afraid xdp mlx[45] support was not tested on arches
with 64k pages at all. Not just this patch.
I think people who care about such archs should test?
Note page per packet is not a hard requirement for all drivers
and all archs. For mlx[45] it was the easiest and the most
convenient way to achieve desired performance.
If there are ways to do the same performance differently,
I'm all ears :)
Eric Dumazet Dec. 3, 2016, 2:15 a.m. UTC | #4
On Fri, 2016-12-02 at 16:53 -0800, Alexei Starovoitov wrote:
> On 12/2/16 4:38 PM, Eric Dumazet wrote:
> > On Fri, 2016-12-02 at 15:23 -0800, Martin KaFai Lau wrote:
> >> When XDP prog is attached, it is currently limiting
> >> MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN) which is 1514
> >> in x86.
> >>
> >> AFAICT, since mlx4 is doing one page per packet for XDP,
> >> we can at least raise the MTU limitation up to
> >> PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this patch is
> >> doing.  It will be useful in the next patch which allows
> >> XDP program to extend the packet by adding new header(s).
> >>
> >> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> >> ---
> >
> > Have you tested your patch on a host with PAGE_SIZE = 64 KB ?
> >
> > Looks XDP really kills arches with bigger pages :(
> 
> I'm afraid xdp mlx[45] support was not tested on arches
> with 64k pages at all. Not just this patch.
> I think people who care about such archs should test?
> Note page per packet is not a hard requirement for all drivers
> and all archs. For mlx[45] it was the easiest and the most
> convenient way to achieve desired performance.
> If there are ways to do the same performance differently,
> I'm all ears :)
> 

My question was more like :

Can we double check all these patches wont break mlx4 driver (non XDP
path) on arches with PAGE_SIZE=64KB.

I have no plan using XDP before a while, but I certainly know some
customers are using mlx4 on powerpc.
Martin KaFai Lau Dec. 3, 2016, 3:42 a.m. UTC | #5
On Fri, Dec 02, 2016 at 06:15:26PM -0800, Eric Dumazet wrote:
> On Fri, 2016-12-02 at 16:53 -0800, Alexei Starovoitov wrote:
> > On 12/2/16 4:38 PM, Eric Dumazet wrote:
> > > On Fri, 2016-12-02 at 15:23 -0800, Martin KaFai Lau wrote:
> > >> When XDP prog is attached, it is currently limiting
> > >> MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN) which is 1514
> > >> in x86.
> > >>
> > >> AFAICT, since mlx4 is doing one page per packet for XDP,
> > >> we can at least raise the MTU limitation up to
> > >> PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this patch is
> > >> doing.  It will be useful in the next patch which allows
> > >> XDP program to extend the packet by adding new header(s).
> > >>
> > >> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > >> ---
> > >
> > > Have you tested your patch on a host with PAGE_SIZE = 64 KB ?
> > >
> > > Looks XDP really kills arches with bigger pages :(
> >
> > I'm afraid xdp mlx[45] support was not tested on arches
> > with 64k pages at all. Not just this patch.
> > I think people who care about such archs should test?
> > Note page per packet is not a hard requirement for all drivers
> > and all archs. For mlx[45] it was the easiest and the most
> > convenient way to achieve desired performance.
> > If there are ways to do the same performance differently,
> > I'm all ears :)
> >
>
> My question was more like :
>
> Can we double check all these patches wont break mlx4 driver (non XDP
> path) on arches with PAGE_SIZE=64KB.
The page/pkt requirement is not added by this patch.  The earlier
XDP patch series has already ensured this page/pkt requirement
is effective only when XDP prog is attached.

In the earlier XDP patches, MTU is limited to 1514 when
XDP is ative.   This patch is to allow fully use of the
page for a packet (and also only matter when XDP is active).
Eric Dumazet Dec. 3, 2016, 3:59 a.m. UTC | #6
On Fri, 2016-12-02 at 19:42 -0800, Martin KaFai Lau wrote:
> On Fri, Dec 02, 2016 at 06:15:26PM -0800, Eric Dumazet wrote:
> > My question was more like :
> >
> > Can we double check all these patches wont break mlx4 driver (non XDP
> > path) on arches with PAGE_SIZE=64KB.
> The page/pkt requirement is not added by this patch.  The earlier
> XDP patch series has already ensured this page/pkt requirement
> is effective only when XDP prog is attached.
> 
> In the earlier XDP patches, MTU is limited to 1514 when
> XDP is ative.   This patch is to allow fully use of the
> page for a packet (and also only matter when XDP is active).

OK, thanks for the clarification.
Martin KaFai Lau Dec. 4, 2016, 3:19 a.m. UTC | #7
On Fri, Dec 02, 2016 at 04:07:09PM -0800, Rick Jones wrote:
> On 12/02/2016 03:23 PM, Martin KaFai Lau wrote:
> >When XDP prog is attached, it is currently limiting
> >MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN) which is 1514
> >in x86.
> >
> >AFAICT, since mlx4 is doing one page per packet for XDP,
> >we can at least raise the MTU limitation up to
> >PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this patch is
> >doing.  It will be useful in the next patch which allows
> >XDP program to extend the packet by adding new header(s).
>
> Is mlx4 the only driver doing page-per-packet?
Sorry for the late reply.  This allocation scheme is only effective
when XDP is active.  AFAIK, only mlx4/5 supports XDP now.
Saeed Mahameed Dec. 4, 2016, 11:31 p.m. UTC | #8
On Sat, Dec 3, 2016 at 2:53 AM, Alexei Starovoitov <ast@fb.com> wrote:
> On 12/2/16 4:38 PM, Eric Dumazet wrote:
>>
>> On Fri, 2016-12-02 at 15:23 -0800, Martin KaFai Lau wrote:
>>>
>>> When XDP prog is attached, it is currently limiting
>>> MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN) which is 1514
>>> in x86.
>>>
>>> AFAICT, since mlx4 is doing one page per packet for XDP,
>>> we can at least raise the MTU limitation up to
>>> PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this patch is
>>> doing.  It will be useful in the next patch which allows
>>> XDP program to extend the packet by adding new header(s).
>>>
>>> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
>>> ---
>>
>>
>> Have you tested your patch on a host with PAGE_SIZE = 64 KB ?
>>
>> Looks XDP really kills arches with bigger pages :(
>
>
> I'm afraid xdp mlx[45] support was not tested on arches
> with 64k pages at all. Not just this patch.

Yep, in mlx5 page per packet became the default, with or without XDP,
unlike mlx4.
currently we allow 64KB pages per packet! which is wrong and need to be fixed.

I will get to this task soon.

> I think people who care about such archs should test?

We do test mlx5 and mlx4 on PPC arch. other than we require more
memory than we need, we don't see any issues. and we don't test XDP on
those archs.

> Note page per packet is not a hard requirement for all drivers
> and all archs. For mlx[45] it was the easiest and the most
> convenient way to achieve desired performance.
> If there are ways to do the same performance differently,
> I'm all ears :)
>

when bigger pages, i.e  PAGE_SIZE > 8K, my current low hanging fruit
options for mlx5 are
1. start sharing pages for multi packets.
2. Go back to the SKB allocator (allocate ring of SKBs on advance
rather than page per packet/s).

this means that default RX memory scheme will be different than XDP's
on such ARCHs (XDP wil still use page per packet)

Alexei, we should start considering PPC archs for XDP use cases,
demanding page per packet on those archs is a little bit heavy
requirement
Eric Dumazet Dec. 5, 2016, 12:52 a.m. UTC | #9
On Mon, 2016-12-05 at 01:31 +0200, Saeed Mahameed wrote:

> Alexei, we should start considering PPC archs for XDP use cases,
> demanding page per packet on those archs is a little bit heavy
> requirement

Well, 'little' is an understatement ;)

Note that PPC had serious problems before  commit bd68a2a854ad5a85f0
("net: set SK_MEM_QUANTUM to 4096")

So I suspect one page per frame will likely be a huge problem
for hosts dealing with 10^5 or more TCP sockets.

Either skb->truesize is set to 64KB and TCP window must be really tiny,
or skb->truesize is set to ~2KB and OOM is waiting to happen.
Jakub Kicinski Dec. 5, 2016, 5:46 p.m. UTC | #10
On Sat, 3 Dec 2016 19:19:58 -0800, Martin KaFai Lau wrote:
> AFAIK, only mlx4/5 supports XDP now.

That is, unfortunately, not true.

$ git checkout net-next/master
$ git grep ndo_xdp drivers/
drivers/net/ethernet/mellanox/mlx4/en_netdev.c: .ndo_xdp                = mlx4_xdp,
drivers/net/ethernet/mellanox/mlx4/en_netdev.c: .ndo_xdp                = mlx4_xdp,
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:      .ndo_xdp                 = mlx5e_xdp,
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:      .ndo_xdp                 = mlx5e_xdp,
drivers/net/ethernet/netronome/nfp/nfp_net_common.c:    .ndo_xdp                = nfp_net_xdp,
drivers/net/ethernet/qlogic/qede/qede_main.c:   .ndo_xdp = qede_xdp,
diff mbox

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 091b904262bc..5df0bbd88d67 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -51,6 +51,8 @@ 
 #include "mlx4_en.h"
 #include "en_port.h"
 
+#define MLX4_EN_MAX_XDP_MTU ((int)(PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN)))
+
 int mlx4_en_setup_tc(struct net_device *dev, u8 up)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
@@ -2262,6 +2264,19 @@  void mlx4_en_destroy_netdev(struct net_device *dev)
 		free_netdev(dev);
 }
 
+static bool mlx4_en_check_xdp_mtu(struct net_device *dev, int mtu)
+{
+	struct mlx4_en_priv *priv = netdev_priv(dev);
+
+	if (mtu > MLX4_EN_MAX_XDP_MTU) {
+		en_err(priv, "mtu:%d > max:%d when XDP prog is attached\n",
+		       mtu, MLX4_EN_MAX_XDP_MTU);
+		return false;
+	}
+
+	return true;
+}
+
 static int mlx4_en_change_mtu(struct net_device *dev, int new_mtu)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
@@ -2271,11 +2286,10 @@  static int mlx4_en_change_mtu(struct net_device *dev, int new_mtu)
 	en_dbg(DRV, priv, "Change MTU called - current:%d new:%d\n",
 		 dev->mtu, new_mtu);
 
-	if (priv->tx_ring_num[TX_XDP] && MLX4_EN_EFF_MTU(new_mtu) > FRAG_SZ0) {
-		en_err(priv, "MTU size:%d requires frags but XDP running\n",
-		       new_mtu);
-		return -EOPNOTSUPP;
-	}
+	if (priv->tx_ring_num[TX_XDP] &&
+	    !mlx4_en_check_xdp_mtu(dev, new_mtu))
+		return -ENOTSUPP;
+
 	dev->mtu = new_mtu;
 
 	if (netif_running(dev)) {
@@ -2723,10 +2737,8 @@  static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 		return 0;
 	}
 
-	if (priv->num_frags > 1) {
-		en_err(priv, "Cannot set XDP if MTU requires multiple frags\n");
+	if (!mlx4_en_check_xdp_mtu(dev, dev->mtu))
 		return -EOPNOTSUPP;
-	}
 
 	tmp = kzalloc(sizeof(*tmp), GFP_KERNEL);
 	if (!tmp)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 6562f78b07f4..23e9d04d1ef4 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -1164,37 +1164,39 @@  static const int frag_sizes[] = {
 
 void mlx4_en_calc_rx_buf(struct net_device *dev)
 {
-	enum dma_data_direction dma_dir = PCI_DMA_FROMDEVICE;
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	int eff_mtu = MLX4_EN_EFF_MTU(dev->mtu);
-	int order = MLX4_EN_ALLOC_PREFER_ORDER;
-	u32 align = SMP_CACHE_BYTES;
-	int buf_size = 0;
 	int i = 0;
 
 	/* bpf requires buffers to be set up as 1 packet per page.
 	 * This only works when num_frags == 1.
 	 */
 	if (priv->tx_ring_num[TX_XDP]) {
-		dma_dir = PCI_DMA_BIDIRECTIONAL;
-		/* This will gain efficient xdp frame recycling at the expense
-		 * of more costly truesize accounting
+		priv->frag_info[0].order = 0;
+		priv->frag_info[0].frag_size = eff_mtu;
+		priv->frag_info[0].frag_prefix_size = 0;
+		/* This will gain efficient xdp frame recycling at the
+		 * expense of more costly truesize accounting
 		 */
-		align = PAGE_SIZE;
-		order = 0;
-	}
-
-	while (buf_size < eff_mtu) {
-		priv->frag_info[i].order = order;
-		priv->frag_info[i].frag_size =
-			(eff_mtu > buf_size + frag_sizes[i]) ?
-				frag_sizes[i] : eff_mtu - buf_size;
-		priv->frag_info[i].frag_prefix_size = buf_size;
-		priv->frag_info[i].frag_stride =
-				ALIGN(priv->frag_info[i].frag_size, align);
-		priv->frag_info[i].dma_dir = dma_dir;
-		buf_size += priv->frag_info[i].frag_size;
-		i++;
+		priv->frag_info[0].frag_stride = PAGE_SIZE;
+		priv->frag_info[0].dma_dir = PCI_DMA_BIDIRECTIONAL;
+		i = 1;
+	} else {
+		int buf_size = 0;
+
+		while (buf_size < eff_mtu) {
+			priv->frag_info[i].order = MLX4_EN_ALLOC_PREFER_ORDER;
+			priv->frag_info[i].frag_size =
+				(eff_mtu > buf_size + frag_sizes[i]) ?
+					frag_sizes[i] : eff_mtu - buf_size;
+			priv->frag_info[i].frag_prefix_size = buf_size;
+			priv->frag_info[i].frag_stride =
+				ALIGN(priv->frag_info[i].frag_size,
+				      SMP_CACHE_BYTES);
+			priv->frag_info[i].dma_dir = PCI_DMA_FROMDEVICE;
+			buf_size += priv->frag_info[i].frag_size;
+			i++;
+		}
 	}
 
 	priv->num_frags = i;