[bpf-next,v4,05/17] xsk: Change the default frame size to 4096 and allow controlling it
diff mbox series

Message ID 20190612155605.22450-6-maximmi@mellanox.com
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series
  • AF_XDP infrastructure improvements and mlx5e support
Related show

Commit Message

Maxim Mikityanskiy June 12, 2019, 3:56 p.m. UTC
The typical XDP memory scheme is one packet per page. Change the AF_XDP
frame size in libbpf to 4096, which is the page size on x86, to allow
libbpf to be used with the drivers with the packet-per-page scheme.

Add a command line option -f to xdpsock to allow to specify a custom
frame size.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
---
 samples/bpf/xdpsock_user.c | 44 ++++++++++++++++++++++++--------------
 tools/lib/bpf/xsk.h        |  2 +-
 2 files changed, 29 insertions(+), 17 deletions(-)

Comments

Jakub Kicinski June 12, 2019, 8:10 p.m. UTC | #1
On Wed, 12 Jun 2019 15:56:43 +0000, Maxim Mikityanskiy wrote:
> The typical XDP memory scheme is one packet per page. Change the AF_XDP
> frame size in libbpf to 4096, which is the page size on x86, to allow
> libbpf to be used with the drivers with the packet-per-page scheme.

This is slightly surprising.  Why does the driver care about the bufsz?

You're not supposed to so page operations on UMEM pages, anyway.
And the RX size filter should be configured according to MTU regardless
of XDP state.

Can you explain?

> Add a command line option -f to xdpsock to allow to specify a custom
> frame size.
> 
> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
> Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Maxim Mikityanskiy June 13, 2019, 2:01 p.m. UTC | #2
On 2019-06-12 23:10, Jakub Kicinski wrote:
> On Wed, 12 Jun 2019 15:56:43 +0000, Maxim Mikityanskiy wrote:
>> The typical XDP memory scheme is one packet per page. Change the AF_XDP
>> frame size in libbpf to 4096, which is the page size on x86, to allow
>> libbpf to be used with the drivers with the packet-per-page scheme.
> 
> This is slightly surprising.  Why does the driver care about the bufsz?

The classic XDP implementation supports only the packet-per-page scheme. 
mlx5e implements this scheme, because it perfectly fits with xdp_return 
and page pool APIs. AF_XDP relies on XDP, and even though AF_XDP doesn't 
really allocate or release pages, it works on top of XDP, and XDP 
implementation in mlx5e does allocate and release pages (in general 
case) and works with the packet-per-page scheme.

> You're not supposed to so page operations on UMEM pages, anyway.
> And the RX size filter should be configured according to MTU regardless
> of XDP state.

Yes, of course, MTU is taken into account.

> Can you explain?
> 
>> Add a command line option -f to xdpsock to allow to specify a custom
>> frame size.
>>
>> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
>> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
>> Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Jakub Kicinski June 13, 2019, 5:29 p.m. UTC | #3
On Thu, 13 Jun 2019 14:01:39 +0000, Maxim Mikityanskiy wrote:
> On 2019-06-12 23:10, Jakub Kicinski wrote:
> > On Wed, 12 Jun 2019 15:56:43 +0000, Maxim Mikityanskiy wrote:  
> >> The typical XDP memory scheme is one packet per page. Change the AF_XDP
> >> frame size in libbpf to 4096, which is the page size on x86, to allow
> >> libbpf to be used with the drivers with the packet-per-page scheme.  
> > 
> > This is slightly surprising.  Why does the driver care about the bufsz?  
> 
> The classic XDP implementation supports only the packet-per-page scheme. 
> mlx5e implements this scheme, because it perfectly fits with xdp_return 
> and page pool APIs. AF_XDP relies on XDP, and even though AF_XDP doesn't 
> really allocate or release pages, it works on top of XDP, and XDP 
> implementation in mlx5e does allocate and release pages (in general 
> case) and works with the packet-per-page scheme.

Yes, okay, I get that.  But I still don't know what's the exact use you
have for AF_XDP buffers being 4k..  Could you point us in the code to
the place which relies on all buffers being 4k in any XDP scenario?

> > You're not supposed to so page operations on UMEM pages, anyway.
> > And the RX size filter should be configured according to MTU regardless
> > of XDP state.  
> 
> Yes, of course, MTU is taken into account.
> 
> > Can you explain?
> >   
> >> Add a command line option -f to xdpsock to allow to specify a custom
> >> frame size.
> >>
> >> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
> >> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
> >> Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Maxim Mikityanskiy June 14, 2019, 1:25 p.m. UTC | #4
On 2019-06-13 20:29, Jakub Kicinski wrote:
> On Thu, 13 Jun 2019 14:01:39 +0000, Maxim Mikityanskiy wrote:
>> On 2019-06-12 23:10, Jakub Kicinski wrote:
>>> On Wed, 12 Jun 2019 15:56:43 +0000, Maxim Mikityanskiy wrote:
>>>> The typical XDP memory scheme is one packet per page. Change the AF_XDP
>>>> frame size in libbpf to 4096, which is the page size on x86, to allow
>>>> libbpf to be used with the drivers with the packet-per-page scheme.
>>>
>>> This is slightly surprising.  Why does the driver care about the bufsz?
>>
>> The classic XDP implementation supports only the packet-per-page scheme.
>> mlx5e implements this scheme, because it perfectly fits with xdp_return
>> and page pool APIs. AF_XDP relies on XDP, and even though AF_XDP doesn't
>> really allocate or release pages, it works on top of XDP, and XDP
>> implementation in mlx5e does allocate and release pages (in general
>> case) and works with the packet-per-page scheme.
> 
> Yes, okay, I get that.  But I still don't know what's the exact use you
> have for AF_XDP buffers being 4k..  Could you point us in the code to
> the place which relies on all buffers being 4k in any XDP scenario?

1. An XDP program is set on all queues, so to support non-4k AF_XDP 
frames, we would also need to support multiple-packet-per-page XDP for 
regular queues.

2. Page allocation in mlx5e perfectly fits page-sized XDP frames. Some 
examples in the code are:

2.1. mlx5e_free_rx_mpwqe calls a generic mlx5e_page_release to release 
the pages of a MPWQE (multi-packet work queue element), which is 
implemented as xsk_umem_fq_reuse for the case of XSK. We avoid extra 
overhead by using the fact that packet == page.

2.2. mlx5e_free_xdpsq_desc performs cleanup after XDP transmits. In case 
of XDP_TX, we can free/recycle the pages without having a refcount 
overhead, by using the fact that packet == page.

>>> You're not supposed to so page operations on UMEM pages, anyway.
>>> And the RX size filter should be configured according to MTU regardless
>>> of XDP state.
>>
>> Yes, of course, MTU is taken into account.
>>
>>> Can you explain?
>>>    
>>>> Add a command line option -f to xdpsock to allow to specify a custom
>>>> frame size.
>>>>
>>>> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
>>>> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
>>>> Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Jakub Kicinski June 15, 2019, 1:40 a.m. UTC | #5
On Fri, 14 Jun 2019 13:25:28 +0000, Maxim Mikityanskiy wrote:
> On 2019-06-13 20:29, Jakub Kicinski wrote:
> > On Thu, 13 Jun 2019 14:01:39 +0000, Maxim Mikityanskiy wrote:  
> > 
> > Yes, okay, I get that.  But I still don't know what's the exact use you
> > have for AF_XDP buffers being 4k..  Could you point us in the code to
> > the place which relies on all buffers being 4k in any XDP scenario?  

Okay, I still don't get it, but that's for explaining :)  Perhaps it
will become clearer when you resping with patch 17 split into
reviewable chunks :)

> 1. An XDP program is set on all queues, so to support non-4k AF_XDP 
> frames, we would also need to support multiple-packet-per-page XDP for 
> regular queues.

Mm.. do you have some materials of how the mlx5 DMA/RX works?  I'd think
that if you do single packet per buffer as long as all packets are
guaranteed to fit in the buffer (based on MRU) the HW shouldn't care
what the size of the buffer is.

> 2. Page allocation in mlx5e perfectly fits page-sized XDP frames. Some 
> examples in the code are:
> 
> 2.1. mlx5e_free_rx_mpwqe calls a generic mlx5e_page_release to release 
> the pages of a MPWQE (multi-packet work queue element), which is 
> implemented as xsk_umem_fq_reuse for the case of XSK. We avoid extra 
> overhead by using the fact that packet == page.
> 
> 2.2. mlx5e_free_xdpsq_desc performs cleanup after XDP transmits. In case 
> of XDP_TX, we can free/recycle the pages without having a refcount 
> overhead, by using the fact that packet == page.
Maxim Mikityanskiy June 18, 2019, noon UTC | #6
On 2019-06-15 04:40, Jakub Kicinski wrote:
> On Fri, 14 Jun 2019 13:25:28 +0000, Maxim Mikityanskiy wrote:
>> On 2019-06-13 20:29, Jakub Kicinski wrote:
>>> On Thu, 13 Jun 2019 14:01:39 +0000, Maxim Mikityanskiy wrote:
>>>
>>> Yes, okay, I get that.  But I still don't know what's the exact use you
>>> have for AF_XDP buffers being 4k..  Could you point us in the code to
>>> the place which relies on all buffers being 4k in any XDP scenario?
> 
> Okay, I still don't get it, but that's for explaining :)  Perhaps it
> will become clearer when you resping with patch 17 split into
> reviewable chunks :)

I'm sorry, as I said above, I don't think splitting it is necessary or 
is a good thing to do. I used to have it separated, but I squashed them 
to shorten the series and to avoid having stupid /* TODO: implement */ 
comments in empty functions that are implemented in the next patch. 
Unsquashing them is going to take more time, which I unfortunately don't 
have as I'm flying to Netconf tomorrow and then going on vacation. So, I 
would really like to avoid it unless absolutely necessary. Moreover, it 
won't increase readability - you'll have to jump between the patches to 
see the complete implementation of a single function - it's a single 
feature, after all.

>> 1. An XDP program is set on all queues, so to support non-4k AF_XDP
>> frames, we would also need to support multiple-packet-per-page XDP for
>> regular queues.
> 
> Mm.. do you have some materials of how the mlx5 DMA/RX works?  I'd think
> that if you do single packet per buffer as long as all packets are
> guaranteed to fit in the buffer (based on MRU) the HW shouldn't care
> what the size of the buffer is.

It's not related to hardware, it helps get better performance by 
utilizing page pool in the optimal way (without having refcnt == 2 on 
pages). Maybe Tariq or Saeed could explain it more clearly.

>> 2. Page allocation in mlx5e perfectly fits page-sized XDP frames. Some
>> examples in the code are:
>>
>> 2.1. mlx5e_free_rx_mpwqe calls a generic mlx5e_page_release to release
>> the pages of a MPWQE (multi-packet work queue element), which is
>> implemented as xsk_umem_fq_reuse for the case of XSK. We avoid extra
>> overhead by using the fact that packet == page.
>>
>> 2.2. mlx5e_free_xdpsq_desc performs cleanup after XDP transmits. In case
>> of XDP_TX, we can free/recycle the pages without having a refcount
>> overhead, by using the fact that packet == page.

Patch
diff mbox series

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index d08ee1ab7bb4..86d173a332c1 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -68,6 +68,7 @@  static int opt_queue;
 static int opt_poll;
 static int opt_interval = 1;
 static u32 opt_xdp_bind_flags;
+static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
 static __u32 prog_id;
 
 struct xsk_umem_info {
@@ -276,6 +277,12 @@  static size_t gen_eth_frame(struct xsk_umem_info *umem, u64 addr)
 static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 {
 	struct xsk_umem_info *umem;
+	struct xsk_umem_config cfg = {
+		.fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS,
+		.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
+		.frame_size = opt_xsk_frame_size,
+		.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM,
+	};
 	int ret;
 
 	umem = calloc(1, sizeof(*umem));
@@ -283,7 +290,7 @@  static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 		exit_with_error(errno);
 
 	ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
-			       NULL);
+			       &cfg);
 	if (ret)
 		exit_with_error(-ret);
 
@@ -323,11 +330,9 @@  static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
 				     &idx);
 	if (ret != XSK_RING_PROD__DEFAULT_NUM_DESCS)
 		exit_with_error(-ret);
-	for (i = 0;
-	     i < XSK_RING_PROD__DEFAULT_NUM_DESCS *
-		     XSK_UMEM__DEFAULT_FRAME_SIZE;
-	     i += XSK_UMEM__DEFAULT_FRAME_SIZE)
-		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = i;
+	for (i = 0; i < XSK_RING_PROD__DEFAULT_NUM_DESCS; i++)
+		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) =
+			i * opt_xsk_frame_size;
 	xsk_ring_prod__submit(&xsk->umem->fq,
 			      XSK_RING_PROD__DEFAULT_NUM_DESCS);
 
@@ -346,6 +351,7 @@  static struct option long_options[] = {
 	{"interval", required_argument, 0, 'n'},
 	{"zero-copy", no_argument, 0, 'z'},
 	{"copy", no_argument, 0, 'c'},
+	{"frame-size", required_argument, 0, 'f'},
 	{0, 0, 0, 0}
 };
 
@@ -365,8 +371,9 @@  static void usage(const char *prog)
 		"  -n, --interval=n	Specify statistics update interval (default 1 sec).\n"
 		"  -z, --zero-copy      Force zero-copy mode.\n"
 		"  -c, --copy           Force copy mode.\n"
+		"  -f, --frame-size=n   Set the frame size (must be a power of two, default is %d).\n"
 		"\n";
-	fprintf(stderr, str, prog);
+	fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE);
 	exit(EXIT_FAILURE);
 }
 
@@ -377,7 +384,7 @@  static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-		c = getopt_long(argc, argv, "Frtli:q:psSNn:cz", long_options,
+		c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:", long_options,
 				&option_index);
 		if (c == -1)
 			break;
@@ -420,6 +427,9 @@  static void parse_command_line(int argc, char **argv)
 		case 'F':
 			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
+		case 'f':
+			opt_xsk_frame_size = atoi(optarg);
+			break;
 		default:
 			usage(basename(argv[0]));
 		}
@@ -432,6 +442,11 @@  static void parse_command_line(int argc, char **argv)
 		usage(basename(argv[0]));
 	}
 
+	if (opt_xsk_frame_size & (opt_xsk_frame_size - 1)) {
+		fprintf(stderr, "--frame-size=%d is not a power of two\n",
+			opt_xsk_frame_size);
+		usage(basename(argv[0]));
+	}
 }
 
 static void kick_tx(struct xsk_socket_info *xsk)
@@ -583,8 +598,7 @@  static void tx_only(struct xsk_socket_info *xsk)
 
 			for (i = 0; i < BATCH_SIZE; i++) {
 				xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->addr
-					= (frame_nb + i) <<
-					XSK_UMEM__DEFAULT_FRAME_SHIFT;
+					= (frame_nb + i) * opt_xsk_frame_size;
 				xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->len =
 					sizeof(pkt_data) - 1;
 			}
@@ -661,21 +675,19 @@  int main(int argc, char **argv)
 	}
 
 	ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
-			     NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE);
+			     NUM_FRAMES * opt_xsk_frame_size);
 	if (ret)
 		exit_with_error(ret);
 
        /* Create sockets... */
-	umem = xsk_configure_umem(bufs,
-				  NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE);
+	umem = xsk_configure_umem(bufs, NUM_FRAMES * opt_xsk_frame_size);
 	xsks[num_socks++] = xsk_configure_socket(umem);
 
 	if (opt_bench == BENCH_TXONLY) {
 		int i;
 
-		for (i = 0; i < NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE;
-		     i += XSK_UMEM__DEFAULT_FRAME_SIZE)
-			(void)gen_eth_frame(umem, i);
+		for (i = 0; i < NUM_FRAMES; i++)
+			(void)gen_eth_frame(umem, i * opt_xsk_frame_size);
 	}
 
 	signal(SIGINT, int_exit);
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index 82ea71a0f3ec..833a6e60d065 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -167,7 +167,7 @@  LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
 
 #define XSK_RING_CONS__DEFAULT_NUM_DESCS      2048
 #define XSK_RING_PROD__DEFAULT_NUM_DESCS      2048
-#define XSK_UMEM__DEFAULT_FRAME_SHIFT    11 /* 2048 bytes */
+#define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
 #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
 #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0