diff mbox

[net-next,v2,2/4] bpf, mlx5: fix various refcount issues in mlx5e_xdp_set

Message ID 582C64FC.4010307@iogearbox.net
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Daniel Borkmann Nov. 16, 2016, 1:54 p.m. UTC
On 11/16/2016 01:25 PM, Saeed Mahameed wrote:
> On Wed, Nov 16, 2016 at 2:04 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> There are multiple issues in mlx5e_xdp_set():
>>
>> 1) The batched bpf_prog_add() is currently not checked for errors! When
>>     doing so, it should be done at an earlier point in time to makes sure
>>     that we cannot fail anymore at the time we want to set the program for
>>     each channel. This only means that we have to undo the bpf_prog_add()
>>     in case we return due to required reset.
>>
>> 2) When swapping the priv->xdp_prog, then no extra reference count must be
>>     taken since we got that from call path via dev_change_xdp_fd() already.
>>     Otherwise, we'd never be able to free the program. Also, bpf_prog_add()
>>     without checking the return code could fail.
>>
>> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> ---
>>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 25 ++++++++++++++++++-----
>>   1 file changed, 20 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> index ab0c336..cf26672 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> @@ -3142,6 +3142,17 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
>>                  goto unlock;
>>          }
>>
>> +       if (prog) {
>> +               /* num_channels is invariant here, so we can take the
>> +                * batched reference right upfront.
>> +                */
>> +               prog = bpf_prog_add(prog, priv->params.num_channels);
>> +               if (IS_ERR(prog)) {
>> +                       err = PTR_ERR(prog);
>> +                       goto unlock;
>> +               }
>> +       }
>> +
>
> With this approach you will end up taking a ref count twice per RQ! on
> the first time xdp_set is called i.e (old_prog = NULL, prog != NULL).
> One ref will be taken per RQ/Channel from the above code, and since
> reset will be TRUE mlx5e_open_locked will be called and another ref
> count will be taken on mlx5e_create_rq.
>
> The issue here is that we have two places for ref count accounting,
> xdp_set and  mlx5e_create_rq. Having ref-count updates in
> mlx5e_create_rq is essential for num_channels configuration changes
> (mlx5e_set_ringparam).
>
> Our previous approach made sure that only one path will do the ref
> counting (mlx5e_open_locked vs. mlx5e_xdp_set batch ref update when
> reset == false).

That is correct, for a short time bpf_prog_add() was charged also when
we reset. When we reset, we will then jump to unlock_put and safely undo
this since we took ref from mlx5e_create_rq() already in that case, and
return successfully. That was intentional, so that eventually we end up
just taking a /single/ ref per RQ when we exit mlx5e_xdp_set(), but more
below ...

[...]
> 2. Keep the current approach and make sure to not update the ref count
> twice, you can batch update only if (!reset && was_open) otherwise you
> can rely on mlx5e_open_locked to take the num_channels ref count for
> you.
>
> Personally I prefer option 2, since we will keep the current logic
> which will allow configuration updates such as (mlx5e_set_ringparam)
> to not worry about ref counting since it will be done in the reset
> flow.

... agree on keeping current approach. I actually like the idea, so we'd
end up with this simpler version for the batched ref then.

Note, your "bpf_prog_add(prog, 1) // one ref for the device." is not needed
since this we already got that one through dev_change_xdp_fd() as mentioned.

Comments

Saeed Mahameed Nov. 16, 2016, 2:30 p.m. UTC | #1
On Wed, Nov 16, 2016 at 3:54 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 11/16/2016 01:25 PM, Saeed Mahameed wrote:
>>
>> On Wed, Nov 16, 2016 at 2:04 AM, Daniel Borkmann <daniel@iogearbox.net>
>> wrote:
>>>
>>> There are multiple issues in mlx5e_xdp_set():
>>>
>>> 1) The batched bpf_prog_add() is currently not checked for errors! When
>>>     doing so, it should be done at an earlier point in time to makes sure
>>>     that we cannot fail anymore at the time we want to set the program
>>> for
>>>     each channel. This only means that we have to undo the bpf_prog_add()
>>>     in case we return due to required reset.
>>>
>>> 2) When swapping the priv->xdp_prog, then no extra reference count must
>>> be
>>>     taken since we got that from call path via dev_change_xdp_fd()
>>> already.
>>>     Otherwise, we'd never be able to free the program. Also,
>>> bpf_prog_add()
>>>     without checking the return code could fail.
>>>
>>> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>>> ---
>>>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 25
>>> ++++++++++++++++++-----
>>>   1 file changed, 20 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> index ab0c336..cf26672 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> @@ -3142,6 +3142,17 @@ static int mlx5e_xdp_set(struct net_device
>>> *netdev, struct bpf_prog *prog)
>>>                  goto unlock;
>>>          }
>>>
>>> +       if (prog) {
>>> +               /* num_channels is invariant here, so we can take the
>>> +                * batched reference right upfront.
>>> +                */
>>> +               prog = bpf_prog_add(prog, priv->params.num_channels);
>>> +               if (IS_ERR(prog)) {
>>> +                       err = PTR_ERR(prog);
>>> +                       goto unlock;
>>> +               }
>>> +       }
>>> +
>>
>>
>> With this approach you will end up taking a ref count twice per RQ! on
>> the first time xdp_set is called i.e (old_prog = NULL, prog != NULL).
>> One ref will be taken per RQ/Channel from the above code, and since
>> reset will be TRUE mlx5e_open_locked will be called and another ref
>> count will be taken on mlx5e_create_rq.
>>
>> The issue here is that we have two places for ref count accounting,
>> xdp_set and  mlx5e_create_rq. Having ref-count updates in
>> mlx5e_create_rq is essential for num_channels configuration changes
>> (mlx5e_set_ringparam).
>>
>> Our previous approach made sure that only one path will do the ref
>> counting (mlx5e_open_locked vs. mlx5e_xdp_set batch ref update when
>> reset == false).
>
>
> That is correct, for a short time bpf_prog_add() was charged also when
> we reset. When we reset, we will then jump to unlock_put and safely undo
> this since we took ref from mlx5e_create_rq() already in that case, and
> return successfully. That was intentional, so that eventually we end up
> just taking a /single/ ref per RQ when we exit mlx5e_xdp_set(), but more
> below ...
>
> [...]
>>
>> 2. Keep the current approach and make sure to not update the ref count
>> twice, you can batch update only if (!reset && was_open) otherwise you
>> can rely on mlx5e_open_locked to take the num_channels ref count for
>> you.
>>
>> Personally I prefer option 2, since we will keep the current logic
>> which will allow configuration updates such as (mlx5e_set_ringparam)
>> to not worry about ref counting since it will be done in the reset
>> flow.
>
>
> ... agree on keeping current approach. I actually like the idea, so we'd
> end up with this simpler version for the batched ref then.
>

Great :).

So let's do the batched update only if we are not going to reset (we
already know that in advance), instead of the current patch where you
batch update unconditionally and then
 unlock_put in case reset was performed (which is just redundant and confusing).

Please note that if open_locked fails you need to goto unlock_put.

> Note, your "bpf_prog_add(prog, 1) // one ref for the device." is not needed
> since this we already got that one through dev_change_xdp_fd() as mentioned.
>

If it is not needed then why we need bpf_prog_put on mlx5e_nic_cleanup
in your next patch? this doesn't look symmetric (right) !
you shouldn't release a reference you didn't take.
Overall with this series the driver can take num_channels refs and
will release num_channels refs on mlx5e_close. we shouldn't release
one extra ref on NIC cleanup.
Daniel Borkmann Nov. 16, 2016, 3:26 p.m. UTC | #2
On 11/16/2016 03:30 PM, Saeed Mahameed wrote:
> On Wed, Nov 16, 2016 at 3:54 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 11/16/2016 01:25 PM, Saeed Mahameed wrote:
>>> On Wed, Nov 16, 2016 at 2:04 AM, Daniel Borkmann <daniel@iogearbox.net>
>>> wrote:
>>>>
>>>> There are multiple issues in mlx5e_xdp_set():
>>>>
>>>> 1) The batched bpf_prog_add() is currently not checked for errors! When
>>>>      doing so, it should be done at an earlier point in time to makes sure
>>>>      that we cannot fail anymore at the time we want to set the program
>>>> for
>>>>      each channel. This only means that we have to undo the bpf_prog_add()
>>>>      in case we return due to required reset.
>>>>
>>>> 2) When swapping the priv->xdp_prog, then no extra reference count must
>>>> be
>>>>      taken since we got that from call path via dev_change_xdp_fd()
>>>> already.
>>>>      Otherwise, we'd never be able to free the program. Also,
>>>> bpf_prog_add()
>>>>      without checking the return code could fail.
>>>>
>>>> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
>>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>>>> ---
>>>>    drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 25
>>>> ++++++++++++++++++-----
>>>>    1 file changed, 20 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>> index ab0c336..cf26672 100644
>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>> @@ -3142,6 +3142,17 @@ static int mlx5e_xdp_set(struct net_device
>>>> *netdev, struct bpf_prog *prog)
>>>>                   goto unlock;
>>>>           }
>>>>
>>>> +       if (prog) {
>>>> +               /* num_channels is invariant here, so we can take the
>>>> +                * batched reference right upfront.
>>>> +                */
>>>> +               prog = bpf_prog_add(prog, priv->params.num_channels);
>>>> +               if (IS_ERR(prog)) {
>>>> +                       err = PTR_ERR(prog);
>>>> +                       goto unlock;
>>>> +               }
>>>> +       }
>>>> +
>>>
>>> With this approach you will end up taking a ref count twice per RQ! on
>>> the first time xdp_set is called i.e (old_prog = NULL, prog != NULL).
>>> One ref will be taken per RQ/Channel from the above code, and since
>>> reset will be TRUE mlx5e_open_locked will be called and another ref
>>> count will be taken on mlx5e_create_rq.
>>>
>>> The issue here is that we have two places for ref count accounting,
>>> xdp_set and  mlx5e_create_rq. Having ref-count updates in
>>> mlx5e_create_rq is essential for num_channels configuration changes
>>> (mlx5e_set_ringparam).
>>>
>>> Our previous approach made sure that only one path will do the ref
>>> counting (mlx5e_open_locked vs. mlx5e_xdp_set batch ref update when
>>> reset == false).
>>
>> That is correct, for a short time bpf_prog_add() was charged also when
>> we reset. When we reset, we will then jump to unlock_put and safely undo
>> this since we took ref from mlx5e_create_rq() already in that case, and
>> return successfully. That was intentional, so that eventually we end up
>> just taking a /single/ ref per RQ when we exit mlx5e_xdp_set(), but more
>> below ...
>>
>> [...]
>>>
>>> 2. Keep the current approach and make sure to not update the ref count
>>> twice, you can batch update only if (!reset && was_open) otherwise you
>>> can rely on mlx5e_open_locked to take the num_channels ref count for
>>> you.
>>>
>>> Personally I prefer option 2, since we will keep the current logic
>>> which will allow configuration updates such as (mlx5e_set_ringparam)
>>> to not worry about ref counting since it will be done in the reset
>>> flow.
>>
>> ... agree on keeping current approach. I actually like the idea, so we'd
>> end up with this simpler version for the batched ref then.
>
> Great :).
>
> So let's do the batched update only if we are not going to reset (we
> already know that in advance), instead of the current patch where you
> batch update unconditionally and then
>   unlock_put in case reset was performed (which is just redundant and confusing).
>
> Please note that if open_locked fails you need to goto unlock_put.

Sorry, I'm not quite sure I follow you here; are you /now/ commenting on
the original patch or on the already updated diff I did from my very last
email, that is, http://patchwork.ozlabs.org/patch/695564/ ?

>> Note, your "bpf_prog_add(prog, 1) // one ref for the device." is not needed
>> since this we already got that one through dev_change_xdp_fd() as mentioned.
>
> If it is not needed then why we need bpf_prog_put on mlx5e_nic_cleanup
> in your next patch? this doesn't look symmetric (right) !
> you shouldn't release a reference you didn't take.
> Overall with this series the driver can take num_channels refs and
> will release num_channels refs on mlx5e_close. we shouldn't release
> one extra ref on NIC cleanup.

I already explained this in the commit description; when dev_change_xdp_fd()
is called and sees a valid fd, it does bpf_prog_get_type(), which calls the
__bpf_prog_get() taking a ref on the program (bpf_prog_inc()). That is then
passed down to ops->ndo_xdp(). Only in case of error from the ->ndo_xdp()
callback, we bpf_prog_put() this reference from dev_change_xdp_fd() side.

For drivers that implement against this ndo, it means that you need N-1 refs
in addition. Have a look at the other two in-tree users, which do it correctly,
that is mlx4_xdp_set() and nfp_net_xdp_setup().

It's documented here (include/linux/netdevice.h) with ndo_xdp referring to it:

/* These structures hold the attributes of xdp state that are being passed
  * to the netdevice through the xdp op.
  */
enum xdp_netdev_command {
	/* Set or clear a bpf program used in the earliest stages of packet
	 * rx. The prog will have been loaded as BPF_PROG_TYPE_XDP. The callee
	 * is responsible for calling bpf_prog_put on any old progs that are
	 * stored. In case of error, the callee need not release the new prog
	 * reference, but on success it takes ownership and must bpf_prog_put
	 * when it is no longer used.
	 */
	XDP_SETUP_PROG,
[...]
};

I think reason why this is rather confusing would be that initially, it was
just a single prog for all queues, but it was requested along the way to move
prog pointer down to queues instead and let them have their own ref, so that
some time in future individual progs from a subset of the queues can be
exchanged.

Since the way xdp in mlx5 is implemented, you currently have the priv->xdp_prog
as the control prog. That's okay right now, but requires to drop the last ref
on priv->xdp_prog via bpf_prog_put() when netdev is dismantled and priv->xdp_prog
still present.
Saeed Mahameed Nov. 17, 2016, 9:48 a.m. UTC | #3
On Wed, Nov 16, 2016 at 5:26 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 11/16/2016 03:30 PM, Saeed Mahameed wrote:
>>
>> On Wed, Nov 16, 2016 at 3:54 PM, Daniel Borkmann <daniel@iogearbox.net>
>> wrote:
>>>
>>> On 11/16/2016 01:25 PM, Saeed Mahameed wrote:
>>>>
>>>> On Wed, Nov 16, 2016 at 2:04 AM, Daniel Borkmann <daniel@iogearbox.net>
>>>> wrote:
>>>>>
>>>>>
>>>>> There are multiple issues in mlx5e_xdp_set():
>>>>>
>>>>> 1) The batched bpf_prog_add() is currently not checked for errors! When
>>>>>      doing so, it should be done at an earlier point in time to makes
>>>>> sure
>>>>>      that we cannot fail anymore at the time we want to set the program
>>>>> for
>>>>>      each channel. This only means that we have to undo the
>>>>> bpf_prog_add()
>>>>>      in case we return due to required reset.
>>>>>
>>>>> 2) When swapping the priv->xdp_prog, then no extra reference count must
>>>>> be
>>>>>      taken since we got that from call path via dev_change_xdp_fd()
>>>>> already.
>>>>>      Otherwise, we'd never be able to free the program. Also,
>>>>> bpf_prog_add()
>>>>>      without checking the return code could fail.
>>>>>
>>>>> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs
>>>>> support")
>>>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>>>>> ---
>>>>>    drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 25
>>>>> ++++++++++++++++++-----
>>>>>    1 file changed, 20 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>> index ab0c336..cf26672 100644
>>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>> @@ -3142,6 +3142,17 @@ static int mlx5e_xdp_set(struct net_device
>>>>> *netdev, struct bpf_prog *prog)
>>>>>                   goto unlock;
>>>>>           }
>>>>>
>>>>> +       if (prog) {
>>>>> +               /* num_channels is invariant here, so we can take the
>>>>> +                * batched reference right upfront.
>>>>> +                */
>>>>> +               prog = bpf_prog_add(prog, priv->params.num_channels);
>>>>> +               if (IS_ERR(prog)) {
>>>>> +                       err = PTR_ERR(prog);
>>>>> +                       goto unlock;
>>>>> +               }
>>>>> +       }
>>>>> +
>>>>
>>>>
>>>> With this approach you will end up taking a ref count twice per RQ! on
>>>> the first time xdp_set is called i.e (old_prog = NULL, prog != NULL).
>>>> One ref will be taken per RQ/Channel from the above code, and since
>>>> reset will be TRUE mlx5e_open_locked will be called and another ref
>>>> count will be taken on mlx5e_create_rq.
>>>>
>>>> The issue here is that we have two places for ref count accounting,
>>>> xdp_set and  mlx5e_create_rq. Having ref-count updates in
>>>> mlx5e_create_rq is essential for num_channels configuration changes
>>>> (mlx5e_set_ringparam).
>>>>
>>>> Our previous approach made sure that only one path will do the ref
>>>> counting (mlx5e_open_locked vs. mlx5e_xdp_set batch ref update when
>>>> reset == false).
>>>
>>>
>>> That is correct, for a short time bpf_prog_add() was charged also when
>>> we reset. When we reset, we will then jump to unlock_put and safely undo
>>> this since we took ref from mlx5e_create_rq() already in that case, and
>>> return successfully. That was intentional, so that eventually we end up
>>> just taking a /single/ ref per RQ when we exit mlx5e_xdp_set(), but more
>>> below ...
>>>
>>> [...]
>>>>
>>>>
>>>> 2. Keep the current approach and make sure to not update the ref count
>>>> twice, you can batch update only if (!reset && was_open) otherwise you
>>>> can rely on mlx5e_open_locked to take the num_channels ref count for
>>>> you.
>>>>
>>>> Personally I prefer option 2, since we will keep the current logic
>>>> which will allow configuration updates such as (mlx5e_set_ringparam)
>>>> to not worry about ref counting since it will be done in the reset
>>>> flow.
>>>
>>>
>>> ... agree on keeping current approach. I actually like the idea, so we'd
>>> end up with this simpler version for the batched ref then.
>>
>>
>> Great :).
>>
>> So let's do the batched update only if we are not going to reset (we
>> already know that in advance), instead of the current patch where you
>> batch update unconditionally and then
>>   unlock_put in case reset was performed (which is just redundant and
>> confusing).
>>
>> Please note that if open_locked fails you need to goto unlock_put.
>
>
> Sorry, I'm not quite sure I follow you here; are you /now/ commenting on
> the original patch or on the already updated diff I did from my very last
> email, that is, http://patchwork.ozlabs.org/patch/695564/ ?
>

I am commenting on this patch  "V2 2/4".
this patch will take batched ref count unconditionally and release the
extra refs "unlock_put" if reset was performed.
I am asking to take the batched ref only if we are not going to reset
in advance to save the extra "unlock_put"


>>> Note, your "bpf_prog_add(prog, 1) // one ref for the device." is not
>>> needed
>>> since this we already got that one through dev_change_xdp_fd() as
>>> mentioned.
>>
>>
>> If it is not needed then why we need bpf_prog_put on mlx5e_nic_cleanup
>> in your next patch? this doesn't look symmetric (right) !
>> you shouldn't release a reference you didn't take.
>> Overall with this series the driver can take num_channels refs and
>> will release num_channels refs on mlx5e_close. we shouldn't release
>> one extra ref on NIC cleanup.
>
>
> I already explained this in the commit description; when dev_change_xdp_fd()
> is called and sees a valid fd, it does bpf_prog_get_type(), which calls the
> __bpf_prog_get() taking a ref on the program (bpf_prog_inc()). That is then
> passed down to ops->ndo_xdp(). Only in case of error from the ->ndo_xdp()
> callback, we bpf_prog_put() this reference from dev_change_xdp_fd() side.
>
> For drivers that implement against this ndo, it means that you need N-1 refs
> in addition. Have a look at the other two in-tree users, which do it
> correctly,
> that is mlx4_xdp_set() and nfp_net_xdp_setup().
>
> It's documented here (include/linux/netdevice.h) with ndo_xdp referring to
> it:
>
> /* These structures hold the attributes of xdp state that are being passed
>  * to the netdevice through the xdp op.
>  */
> enum xdp_netdev_command {
>         /* Set or clear a bpf program used in the earliest stages of packet
>          * rx. The prog will have been loaded as BPF_PROG_TYPE_XDP. The
> callee
>          * is responsible for calling bpf_prog_put on any old progs that are
>          * stored. In case of error, the callee need not release the new
> prog
>          * reference, but on success it takes ownership and must
> bpf_prog_put
>          * when it is no longer used.
>          */
>         XDP_SETUP_PROG,
> [...]
> };
>
> I think reason why this is rather confusing would be that initially, it was
> just a single prog for all queues, but it was requested along the way to
> move
> prog pointer down to queues instead and let them have their own ref, so that
> some time in future individual progs from a subset of the queues can be
> exchanged.
>
> Since the way xdp in mlx5 is implemented, you currently have the
> priv->xdp_prog
> as the control prog. That's okay right now, but requires to drop the last
> ref
> on priv->xdp_prog via bpf_prog_put() when netdev is dismantled and
> priv->xdp_prog
> still present.

Got it, but i would rather make the stack cleanup those refs once it
receives an unregester_netdev event instead of counting on netdev to
do it, as i said before, like any other resource added to the netdev
by the stack (vlan/mac/etc..).

but this is a general discussion of xdp API, it won't block this series :).

Saeed.
Daniel Borkmann Nov. 17, 2016, 8:03 p.m. UTC | #4
On 11/17/2016 10:48 AM, Saeed Mahameed wrote:
> On Wed, Nov 16, 2016 at 5:26 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 11/16/2016 03:30 PM, Saeed Mahameed wrote:
>>> On Wed, Nov 16, 2016 at 3:54 PM, Daniel Borkmann <daniel@iogearbox.net>
>>> wrote:
>>>> On 11/16/2016 01:25 PM, Saeed Mahameed wrote:
>>>>> On Wed, Nov 16, 2016 at 2:04 AM, Daniel Borkmann <daniel@iogearbox.net>
>>>>> wrote:
>>>>>>
>>>>>> There are multiple issues in mlx5e_xdp_set():
>>>>>>
>>>>>> 1) The batched bpf_prog_add() is currently not checked for errors! When
>>>>>>       doing so, it should be done at an earlier point in time to makes
>>>>>> sure
>>>>>>       that we cannot fail anymore at the time we want to set the program
>>>>>> for
>>>>>>       each channel. This only means that we have to undo the
>>>>>> bpf_prog_add()
>>>>>>       in case we return due to required reset.
>>>>>>
>>>>>> 2) When swapping the priv->xdp_prog, then no extra reference count must
>>>>>> be
>>>>>>       taken since we got that from call path via dev_change_xdp_fd()
>>>>>> already.
>>>>>>       Otherwise, we'd never be able to free the program. Also,
>>>>>> bpf_prog_add()
>>>>>>       without checking the return code could fail.
>>>>>>
>>>>>> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs
>>>>>> support")
>>>>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>>>>>> ---
>>>>>>     drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 25
>>>>>> ++++++++++++++++++-----
>>>>>>     1 file changed, 20 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>>> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>>> index ab0c336..cf26672 100644
>>>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>>> @@ -3142,6 +3142,17 @@ static int mlx5e_xdp_set(struct net_device
>>>>>> *netdev, struct bpf_prog *prog)
>>>>>>                    goto unlock;
>>>>>>            }
>>>>>>
>>>>>> +       if (prog) {
>>>>>> +               /* num_channels is invariant here, so we can take the
>>>>>> +                * batched reference right upfront.
>>>>>> +                */
>>>>>> +               prog = bpf_prog_add(prog, priv->params.num_channels);
>>>>>> +               if (IS_ERR(prog)) {
>>>>>> +                       err = PTR_ERR(prog);
>>>>>> +                       goto unlock;
>>>>>> +               }
>>>>>> +       }
>>>>>> +
>>>>>
>>>>> With this approach you will end up taking a ref count twice per RQ! on
>>>>> the first time xdp_set is called i.e (old_prog = NULL, prog != NULL).
>>>>> One ref will be taken per RQ/Channel from the above code, and since
>>>>> reset will be TRUE mlx5e_open_locked will be called and another ref
>>>>> count will be taken on mlx5e_create_rq.
>>>>>
>>>>> The issue here is that we have two places for ref count accounting,
>>>>> xdp_set and  mlx5e_create_rq. Having ref-count updates in
>>>>> mlx5e_create_rq is essential for num_channels configuration changes
>>>>> (mlx5e_set_ringparam).
>>>>>
>>>>> Our previous approach made sure that only one path will do the ref
>>>>> counting (mlx5e_open_locked vs. mlx5e_xdp_set batch ref update when
>>>>> reset == false).
>>>>
>>>> That is correct, for a short time bpf_prog_add() was charged also when
>>>> we reset. When we reset, we will then jump to unlock_put and safely undo
>>>> this since we took ref from mlx5e_create_rq() already in that case, and
>>>> return successfully. That was intentional, so that eventually we end up
>>>> just taking a /single/ ref per RQ when we exit mlx5e_xdp_set(), but more
>>>> below ...
>>>>
>>>> [...]
>>>>>
>>>>> 2. Keep the current approach and make sure to not update the ref count
>>>>> twice, you can batch update only if (!reset && was_open) otherwise you
>>>>> can rely on mlx5e_open_locked to take the num_channels ref count for
>>>>> you.
>>>>>
>>>>> Personally I prefer option 2, since we will keep the current logic
>>>>> which will allow configuration updates such as (mlx5e_set_ringparam)
>>>>> to not worry about ref counting since it will be done in the reset
>>>>> flow.
>>>>
>>>> ... agree on keeping current approach. I actually like the idea, so we'd
>>>> end up with this simpler version for the batched ref then.
>>>
>>> Great :).
>>>
>>> So let's do the batched update only if we are not going to reset (we
>>> already know that in advance), instead of the current patch where you
>>> batch update unconditionally and then
>>>    unlock_put in case reset was performed (which is just redundant and
>>> confusing).
>>>
>>> Please note that if open_locked fails you need to goto unlock_put.
>>
>> Sorry, I'm not quite sure I follow you here; are you /now/ commenting on
>> the original patch or on the already updated diff I did from my very last
>> email, that is, http://patchwork.ozlabs.org/patch/695564/ ?
>
> I am commenting on this patch  "V2 2/4".
> this patch will take batched ref count unconditionally and release the
> extra refs "unlock_put" if reset was performed.
> I am asking to take the batched ref only if we are not going to reset
> in advance to save the extra "unlock_put"

Okay, sure, it's clear and we discussed about that already earlier, and
my response back then was the diff in the /end/ of this email here:

   https://www.spinics.net/lists/netdev/msg404709.html

Hence my confusion why we were back on the original patch after that. ;)
Anyway, I'll get the updated set with the suggestion out tomorrow, thanks
for the review!

>>>> Note, your "bpf_prog_add(prog, 1) // one ref for the device." is not
>>>> needed
>>>> since this we already got that one through dev_change_xdp_fd() as
>>>> mentioned.
>>>
>>> If it is not needed then why we need bpf_prog_put on mlx5e_nic_cleanup
>>> in your next patch? this doesn't look symmetric (right) !
>>> you shouldn't release a reference you didn't take.
>>> Overall with this series the driver can take num_channels refs and
>>> will release num_channels refs on mlx5e_close. we shouldn't release
>>> one extra ref on NIC cleanup.
>>
>> I already explained this in the commit description; when dev_change_xdp_fd()
>> is called and sees a valid fd, it does bpf_prog_get_type(), which calls the
>> __bpf_prog_get() taking a ref on the program (bpf_prog_inc()). That is then
>> passed down to ops->ndo_xdp(). Only in case of error from the ->ndo_xdp()
>> callback, we bpf_prog_put() this reference from dev_change_xdp_fd() side.
>>
>> For drivers that implement against this ndo, it means that you need N-1 refs
>> in addition. Have a look at the other two in-tree users, which do it
>> correctly,
>> that is mlx4_xdp_set() and nfp_net_xdp_setup().
>>
>> It's documented here (include/linux/netdevice.h) with ndo_xdp referring to
>> it:
>>
>> /* These structures hold the attributes of xdp state that are being passed
>>   * to the netdevice through the xdp op.
>>   */
>> enum xdp_netdev_command {
>>          /* Set or clear a bpf program used in the earliest stages of packet
>>           * rx. The prog will have been loaded as BPF_PROG_TYPE_XDP. The
>> callee
>>           * is responsible for calling bpf_prog_put on any old progs that are
>>           * stored. In case of error, the callee need not release the new
>> prog
>>           * reference, but on success it takes ownership and must
>> bpf_prog_put
>>           * when it is no longer used.
>>           */
>>          XDP_SETUP_PROG,
>> [...]
>> };
>>
>> I think reason why this is rather confusing would be that initially, it was
>> just a single prog for all queues, but it was requested along the way to
>> move
>> prog pointer down to queues instead and let them have their own ref, so that
>> some time in future individual progs from a subset of the queues can be
>> exchanged.
>>
>> Since the way xdp in mlx5 is implemented, you currently have the
>> priv->xdp_prog
>> as the control prog. That's okay right now, but requires to drop the last
>> ref
>> on priv->xdp_prog via bpf_prog_put() when netdev is dismantled and
>> priv->xdp_prog
>> still present.
>
> Got it, but i would rather make the stack cleanup those refs once it
> receives an unregester_netdev event instead of counting on netdev to
> do it, as i said before, like any other resource added to the netdev
> by the stack (vlan/mac/etc..).
>
> but this is a general discussion of xdp API, it won't block this series :).
>
> Saeed.
diff mbox

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index ab0c336..09ac27c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3148,11 +3148,21 @@  static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)

  	if (was_opened && reset)
  		mlx5e_close_locked(netdev);
+	if (was_opened && !reset) {
+		/* num_channels is invariant here, so we can take the
+		 * batched reference right upfront.
+		 */
+		prog = bpf_prog_add(prog, priv->params.num_channels);
+		if (IS_ERR(prog)) {
+			err = PTR_ERR(prog);
+			goto unlock;
+		}
+	}

-	/* exchange programs */
+	/* exchange programs, extra prog reference we got from caller
+	 * as long as we don't fail from this point onwards.
+	 */
  	old_prog = xchg(&priv->xdp_prog, prog);
-	if (prog)
-		bpf_prog_add(prog, 1);
  	if (old_prog)
  		bpf_prog_put(old_prog);

@@ -3168,7 +3178,6 @@  static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
  	/* exchanging programs w/o reset, we update ref counts on behalf
  	 * of the channels RQs here.
  	 */
-	bpf_prog_add(prog, priv->params.num_channels);
  	for (i = 0; i < priv->params.num_channels; i++) {
  		struct mlx5e_channel *c = priv->channel[i];