diff mbox series

[bpf-next,4/4] ixgbe, xsk: use XSK_NAPI_WEIGHT as NAPI poll budget

Message ID 20200907150217.30888-5-bjorn.topel@gmail.com
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series xsk: increase NAPI budget for AF_XDP zero-copy path | expand

Commit Message

Björn Töpel Sept. 7, 2020, 3:02 p.m. UTC
From: Björn Töpel <bjorn.topel@intel.com>

Start using XSK_NAPI_WEIGHT as NAPI poll budget for the AF_XDP Rx
zero-copy path.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jakub Kicinski Sept. 7, 2020, 7:32 p.m. UTC | #1
On Mon,  7 Sep 2020 17:02:17 +0200 Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> Start using XSK_NAPI_WEIGHT as NAPI poll budget for the AF_XDP Rx
> zero-copy path.
> 
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> index 3771857cf887..f32c1ba0d237 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> @@ -239,7 +239,7 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
>  	bool failure = false;
>  	struct sk_buff *skb;
>  
> -	while (likely(total_rx_packets < budget)) {
> +	while (likely(total_rx_packets < XSK_NAPI_WEIGHT)) {

I was thinking that we'd multiply 'budget' here, not replace it with a
constant. Looks like ixgbe dutifully passes 'per_ring_budget' into the
clean_rx functions, not a complete NAPI budget.

>  		union ixgbe_adv_rx_desc *rx_desc;
>  		struct ixgbe_rx_buffer *bi;
>  		unsigned int size;
Björn Töpel Sept. 8, 2020, 5:38 a.m. UTC | #2
On 2020-09-07 21:32, Jakub Kicinski wrote:
> On Mon,  7 Sep 2020 17:02:17 +0200 Björn Töpel wrote:
>> From: Björn Töpel <bjorn.topel@intel.com>
>>
>> Start using XSK_NAPI_WEIGHT as NAPI poll budget for the AF_XDP Rx
>> zero-copy path.
>>
>> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
>> ---
>>   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> index 3771857cf887..f32c1ba0d237 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> @@ -239,7 +239,7 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
>>   	bool failure = false;
>>   	struct sk_buff *skb;
>>   
>> -	while (likely(total_rx_packets < budget)) {
>> +	while (likely(total_rx_packets < XSK_NAPI_WEIGHT)) {
> 
> I was thinking that we'd multiply 'budget' here, not replace it with a
> constant. Looks like ixgbe dutifully passes 'per_ring_budget' into the
> clean_rx functions, not a complete NAPI budget.
>

Correct, and i40e/ice does the same ("per_ring_budget").

As for budget << XSK_NAPI_MULT vs replacing; Replacing the budget is 
more in line with what the drivers do for the Tx cleanup 
(xxx_clean_tx_irq), where the napi budget is discarded completely; 
Again, with the idea that "this is much cheaper than a "per-packet 
through the stack".

Do you prefer the multiplier way that you describe?


Cheers,
Björn


>>   		union ixgbe_adv_rx_desc *rx_desc;
>>   		struct ixgbe_rx_buffer *bi;
>>   		unsigned int size;
>
Eric Dumazet Sept. 8, 2020, 9:45 a.m. UTC | #3
On 9/7/20 5:02 PM, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> Start using XSK_NAPI_WEIGHT as NAPI poll budget for the AF_XDP Rx
> zero-copy path.
> 
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> index 3771857cf887..f32c1ba0d237 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> @@ -239,7 +239,7 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
>  	bool failure = false;
>  	struct sk_buff *skb;
>  
> -	while (likely(total_rx_packets < budget)) {
> +	while (likely(total_rx_packets < XSK_NAPI_WEIGHT)) {
>  		union ixgbe_adv_rx_desc *rx_desc;
>  		struct ixgbe_rx_buffer *bi;
>  		unsigned int size

This is a violation of NAPI API. IXGBE is already diverging a bit from best practices.

There are reasons we want to control the budget from callers,
if you want bigger budget just increase it instead of using your own ?

I would rather use a generic patch.

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7bd4fcdd0738a718d8b0f7134523cd87e4dcdb7b..33bcbdb6fef488983438c6584e3cbb0a44febb1a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2311,11 +2311,14 @@ static inline void *netdev_priv(const struct net_device *dev)
  */
 #define SET_NETDEV_DEVTYPE(net, devtype)       ((net)->dev.type = (devtype))
 
-/* Default NAPI poll() weight
- * Device drivers are strongly advised to not use bigger value
- */
+/* Default NAPI poll() weight. Highly recommended. */
 #define NAPI_POLL_WEIGHT 64
 
+/* Device drivers are strongly advised to not use bigger value,
+ * as this might cause latencies in stress conditions.
+ */
+#define NAPI_POLL_WEIGHT_MAX 256
+
 /**
  *     netif_napi_add - initialize a NAPI context
  *     @dev:  network device
diff --git a/net/core/dev.c b/net/core/dev.c
index 4086d335978c1bf62bd3965bd2ea96a4ac06b13d..496713fb6075bd8e5e22725e7c817172858e1dd7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6608,7 +6608,7 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
        INIT_LIST_HEAD(&napi->rx_list);
        napi->rx_count = 0;
        napi->poll = poll;
-       if (weight > NAPI_POLL_WEIGHT)
+       if (weight > NAPI_POLL_WEIGHT_MAX)
                netdev_err_once(dev, "%s() called with weight %d\n", __func__,
                                weight);
        napi->weight = weight;
Paul Menzel Sept. 8, 2020, 10:12 a.m. UTC | #4
Dear Björn,


Am 07.09.20 um 17:02 schrieb Björn Töpel:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> Start using XSK_NAPI_WEIGHT as NAPI poll budget for the AF_XDP Rx
> zero-copy path.

Could you please add the description from the patch series cover letter 
to this commit too? To my knowledge, the message in the cover letter 
won’t be stored in the git repository.

> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)

[…]


Kind regards,

Paul
Björn Töpel Sept. 8, 2020, 11:12 a.m. UTC | #5
On 2020-09-08 12:12, Paul Menzel wrote:
> Dear Björn,
> 
> 
> Am 07.09.20 um 17:02 schrieb Björn Töpel:
>> From: Björn Töpel <bjorn.topel@intel.com>
>>
>> Start using XSK_NAPI_WEIGHT as NAPI poll budget for the AF_XDP Rx
>> zero-copy path.
> 
> Could you please add the description from the patch series cover letter 
> to this commit too? To my knowledge, the message in the cover letter 
> won’t be stored in the git repository.
>

Paul, thanks for the input! The netdev/bpf trees always include the 
cover letter in the merge commit.


Cheers,
Björn

>> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
>> ---
>>   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> […]
> 
> 
> Kind regards,
> 
> Paul
Paul Menzel Sept. 8, 2020, 11:20 a.m. UTC | #6
Dear Björn,


Am 08.09.20 um 13:12 schrieb Björn Töpel:
> On 2020-09-08 12:12, Paul Menzel wrote:

>> Am 07.09.20 um 17:02 schrieb Björn Töpel:
>>> From: Björn Töpel <bjorn.topel@intel.com>
>>>
>>> Start using XSK_NAPI_WEIGHT as NAPI poll budget for the AF_XDP Rx
>>> zero-copy path.
>>
>> Could you please add the description from the patch series cover 
>> letter to this commit too? To my knowledge, the message in the cover 
>> letter won’t be stored in the git repository.
> 
> Paul, thanks for the input! The netdev/bpf trees always include the 
> cover letter in the merge commit.

Yes, for pull/merge requests. But you posted them to the list, so I’d 
assume they will be applied with `git am` and not merged, or am I 
missing something. Could you please point me to a merge commit where the 
patches were posted to the list?


Kind regards,

Paul
Björn Töpel Sept. 8, 2020, 11:43 a.m. UTC | #7
On 2020-09-08 13:20, Paul Menzel wrote:
[...]>> Paul, thanks for the input! The netdev/bpf trees always include the
>> cover letter in the merge commit.
> 
> Yes, for pull/merge requests. But you posted them to the list, so I’d 
> assume they will be applied with `git am` and not merged, or am I 
> missing something. Could you please point me to a merge commit where the 
> patches were posted to the list?
> 

An example: A series is posted to the list [1], and when merged the 
merge commit look like [2].


Thanks,
Björn


[1] 
https://lore.kernel.org/bpf/20200520192103.355233-1-bjorn.topel@gmail.com/
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=79917b242c3fe0d89e4752bc25ffef4574c2194b
Björn Töpel Sept. 8, 2020, 11:49 a.m. UTC | #8
On 2020-09-08 11:45, Eric Dumazet wrote:
> 
> 
> On 9/7/20 5:02 PM, Björn Töpel wrote:
>> From: Björn Töpel <bjorn.topel@intel.com>
>>
>> Start using XSK_NAPI_WEIGHT as NAPI poll budget for the AF_XDP Rx
>> zero-copy path.
>>
>> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
>> ---
>>   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> index 3771857cf887..f32c1ba0d237 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> @@ -239,7 +239,7 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
>>   	bool failure = false;
>>   	struct sk_buff *skb;
>>   
>> -	while (likely(total_rx_packets < budget)) {
>> +	while (likely(total_rx_packets < XSK_NAPI_WEIGHT)) {
>>   		union ixgbe_adv_rx_desc *rx_desc;
>>   		struct ixgbe_rx_buffer *bi;
>>   		unsigned int size
> 
> This is a violation of NAPI API. IXGBE is already diverging a bit from best practices.
>

Thanks for having a look, Eric! By diverging from best practices, do
you mean that multiple queues share one NAPI context, and the budget
is split over the queues (say, 4 queues, 64/4 per queue), or that Tx
simply ignores the budget? Or both?

> There are reasons we want to control the budget from callers,
> if you want bigger budget just increase it instead of using your own ?
> 
> I would rather use a generic patch.
>

Hmm, so a configurable NAPI budget for, say, the AF_XDP enabled
queues/NAPIs? Am I reading that correct? (Note that this is *only* for
the AF_XDP enabled queues.)

I'll try to rework this to something more palatable.


Thanks,
Björn


> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 7bd4fcdd0738a718d8b0f7134523cd87e4dcdb7b..33bcbdb6fef488983438c6584e3cbb0a44febb1a 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -2311,11 +2311,14 @@ static inline void *netdev_priv(const struct net_device *dev)
>    */
>   #define SET_NETDEV_DEVTYPE(net, devtype)       ((net)->dev.type = (devtype))
>   
> -/* Default NAPI poll() weight
> - * Device drivers are strongly advised to not use bigger value
> - */
> +/* Default NAPI poll() weight. Highly recommended. */
>   #define NAPI_POLL_WEIGHT 64
>   
> +/* Device drivers are strongly advised to not use bigger value,
> + * as this might cause latencies in stress conditions.
> + */
> +#define NAPI_POLL_WEIGHT_MAX 256
> +
>   /**
>    *     netif_napi_add - initialize a NAPI context
>    *     @dev:  network device
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 4086d335978c1bf62bd3965bd2ea96a4ac06b13d..496713fb6075bd8e5e22725e7c817172858e1dd7 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6608,7 +6608,7 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
>          INIT_LIST_HEAD(&napi->rx_list);
>          napi->rx_count = 0;
>          napi->poll = poll;
> -       if (weight > NAPI_POLL_WEIGHT)
> +       if (weight > NAPI_POLL_WEIGHT_MAX)
>                  netdev_err_once(dev, "%s() called with weight %d\n", __func__,
>                                  weight);
>          napi->weight = weight;
>
Eric Dumazet Sept. 8, 2020, 3:12 p.m. UTC | #9
On 9/8/20 1:49 PM, Björn Töpel wrote:
> On 2020-09-08 11:45, Eric Dumazet wrote:
>>
>>
>> On 9/7/20 5:02 PM, Björn Töpel wrote:
>>> From: Björn Töpel <bjorn.topel@intel.com>
>>>
>>> Start using XSK_NAPI_WEIGHT as NAPI poll budget for the AF_XDP Rx
>>> zero-copy path.
>>>
>>> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
>>> ---
>>>   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>>> index 3771857cf887..f32c1ba0d237 100644
>>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>>> @@ -239,7 +239,7 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
>>>       bool failure = false;
>>>       struct sk_buff *skb;
>>>   -    while (likely(total_rx_packets < budget)) {
>>> +    while (likely(total_rx_packets < XSK_NAPI_WEIGHT)) {
>>>           union ixgbe_adv_rx_desc *rx_desc;
>>>           struct ixgbe_rx_buffer *bi;
>>>           unsigned int size
>>
>> This is a violation of NAPI API. IXGBE is already diverging a bit from best practices.
>>
> 
> Thanks for having a look, Eric! By diverging from best practices, do
> you mean that multiple queues share one NAPI context, and the budget
> is split over the queues (say, 4 queues, 64/4 per queue), or that Tx
> simply ignores the budget? Or both?

For instance, having ixgbe_poll() doing :

return min(work_done, budget - 1);

is quite interesting. It could hide bugs like :

I got a budget of 4, and processed 9999 packets because my maths have a bug,
but make sure to pretend we processed 3 packets...


> 
>> There are reasons we want to control the budget from callers,
>> if you want bigger budget just increase it instead of using your own ?
>>
>> I would rather use a generic patch.
>>
> 
> Hmm, so a configurable NAPI budget for, say, the AF_XDP enabled
> queues/NAPIs? Am I reading that correct? (Note that this is *only* for
> the AF_XDP enabled queues.)
> 
> I'll try to rework this to something more palatable.
> 
> 
> Thanks,
> Björn
> 
>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 3771857cf887..f32c1ba0d237 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -239,7 +239,7 @@  int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
 	bool failure = false;
 	struct sk_buff *skb;
 
-	while (likely(total_rx_packets < budget)) {
+	while (likely(total_rx_packets < XSK_NAPI_WEIGHT)) {
 		union ixgbe_adv_rx_desc *rx_desc;
 		struct ixgbe_rx_buffer *bi;
 		unsigned int size;