diff mbox

[net-next,1/5] net: Do not record sender_cpu as napi_id in socket receive paths

Message ID 20170316183238.15806.14293.stgit@localhost.localdomain
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Alexander Duyck March 16, 2017, 6:32 p.m. UTC
From: Sridhar Samudrala <sridhar.samudrala@intel.com>

Fix sk_mark_napi_id() and sk_mark_napi_id_once() to set sk_napi_id only if
skb->napi_id is a valid value.

This happens in loopback paths where skb->napi_id is not updated in
rx path and holds sender_cpu that is set in xmit path.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 include/net/busy_poll.h |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Eric Dumazet March 16, 2017, 10:05 p.m. UTC | #1
On Thu, 2017-03-16 at 11:32 -0700, Alexander Duyck wrote:
> From: Sridhar Samudrala <sridhar.samudrala@intel.com>
> 
> Fix sk_mark_napi_id() and sk_mark_napi_id_once() to set sk_napi_id only if
> skb->napi_id is a valid value.
> 
> This happens in loopback paths where skb->napi_id is not updated in
> rx path and holds sender_cpu that is set in xmit path.
> 
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
>  include/net/busy_poll.h |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
> index c0452de83086..67991635953e 100644
> --- a/include/net/busy_poll.h
> +++ b/include/net/busy_poll.h
> @@ -116,7 +116,8 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock)
>  static inline void sk_mark_napi_id(struct sock *sk, const struct sk_buff *skb)
>  {
>  #ifdef CONFIG_NET_RX_BUSY_POLL
> -	sk->sk_napi_id = skb->napi_id;
> +	if (skb->napi_id > (u32)NR_CPUS)
> +		sk->sk_napi_id = skb->napi_id;
>  #endif
>  }
>  
> @@ -125,7 +126,7 @@ static inline void sk_mark_napi_id_once(struct sock *sk,
>  					const struct sk_buff *skb)
>  {
>  #ifdef CONFIG_NET_RX_BUSY_POLL
> -	if (!sk->sk_napi_id)
> +	if (!sk->sk_napi_id && (skb->napi_id > (u32)NR_CPUS))
>  		sk->sk_napi_id = skb->napi_id;
>  #endif
>  }
> 

It is not clear why this patch is needed .

What you describe here is the case we might receive packets for a socket
coming from different interfaces ?

If skb->napi_id is a sender_cpu, why should we prevent overwriting the
sk_napi_id with it, knowing that busy polling will simply ignore the
invalid value ?

Do not get me wrong :

I simply try to understand why the test about napi_id validity is now
done twice :

1) At the time we are writing into sk->sk_napi_id

2) At busy polling time when we read sk->sk_napi_id
Alexander Duyck March 16, 2017, 10:33 p.m. UTC | #2
On Thu, Mar 16, 2017 at 3:05 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2017-03-16 at 11:32 -0700, Alexander Duyck wrote:
>> From: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>
>> Fix sk_mark_napi_id() and sk_mark_napi_id_once() to set sk_napi_id only if
>> skb->napi_id is a valid value.
>>
>> This happens in loopback paths where skb->napi_id is not updated in
>> rx path and holds sender_cpu that is set in xmit path.
>>
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> ---
>>  include/net/busy_poll.h |    5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
>> index c0452de83086..67991635953e 100644
>> --- a/include/net/busy_poll.h
>> +++ b/include/net/busy_poll.h
>> @@ -116,7 +116,8 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock)
>>  static inline void sk_mark_napi_id(struct sock *sk, const struct sk_buff *skb)
>>  {
>>  #ifdef CONFIG_NET_RX_BUSY_POLL
>> -     sk->sk_napi_id = skb->napi_id;
>> +     if (skb->napi_id > (u32)NR_CPUS)
>> +             sk->sk_napi_id = skb->napi_id;
>>  #endif
>>  }
>>
>> @@ -125,7 +126,7 @@ static inline void sk_mark_napi_id_once(struct sock *sk,
>>                                       const struct sk_buff *skb)
>>  {
>>  #ifdef CONFIG_NET_RX_BUSY_POLL
>> -     if (!sk->sk_napi_id)
>> +     if (!sk->sk_napi_id && (skb->napi_id > (u32)NR_CPUS))
>>               sk->sk_napi_id = skb->napi_id;
>>  #endif
>>  }
>>
>
> It is not clear why this patch is needed .
>
> What you describe here is the case we might receive packets for a socket
> coming from different interfaces ?
>
> If skb->napi_id is a sender_cpu, why should we prevent overwriting the
> sk_napi_id with it, knowing that busy polling will simply ignore the
> invalid value ?
>
> Do not get me wrong :
>
> I simply try to understand why the test about napi_id validity is now
> done twice :
>
> 1) At the time we are writing into sk->sk_napi_id

I would argue that this is the one piece we were missing.

> 2) At busy polling time when we read sk->sk_napi_id

Unless there was something recently added I don't think this was ever
checked.  Instead we start digging into the hash looking for the ID
that won't ever be there.  Maybe we should add something to napi_by_id
that just returns NULL in these cases.

On top of that I think there end up being several spots where once we
lock in a non-NAPI ID it is stuck such as the sk_mark_napi_id_once
call.  I figure we are better off locking in an actual NAPI ID rather
than getting a sender_cpu stuck in there.
Samudrala, Sridhar March 16, 2017, 10:41 p.m. UTC | #3
On 3/16/2017 3:05 PM, Eric Dumazet wrote:
> On Thu, 2017-03-16 at 11:32 -0700, Alexander Duyck wrote:
>> From: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>
>> Fix sk_mark_napi_id() and sk_mark_napi_id_once() to set sk_napi_id only if
>> skb->napi_id is a valid value.
>>
>> This happens in loopback paths where skb->napi_id is not updated in
>> rx path and holds sender_cpu that is set in xmit path.
>>
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> ---
>>   include/net/busy_poll.h |    5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
>> index c0452de83086..67991635953e 100644
>> --- a/include/net/busy_poll.h
>> +++ b/include/net/busy_poll.h
>> @@ -116,7 +116,8 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock)
>>   static inline void sk_mark_napi_id(struct sock *sk, const struct sk_buff *skb)
>>   {
>>   #ifdef CONFIG_NET_RX_BUSY_POLL
>> -	sk->sk_napi_id = skb->napi_id;
>> +	if (skb->napi_id > (u32)NR_CPUS)
>> +		sk->sk_napi_id = skb->napi_id;
>>   #endif
>>   }
>>   
>> @@ -125,7 +126,7 @@ static inline void sk_mark_napi_id_once(struct sock *sk,
>>   					const struct sk_buff *skb)
>>   {
>>   #ifdef CONFIG_NET_RX_BUSY_POLL
>> -	if (!sk->sk_napi_id)
>> +	if (!sk->sk_napi_id && (skb->napi_id > (u32)NR_CPUS))
>>   		sk->sk_napi_id = skb->napi_id;
>>   #endif
>>   }
>>
> It is not clear why this patch is needed .
>
> What you describe here is the case we might receive packets for a socket
> coming from different interfaces ?

This is seen with AF_UNIX or AF_INET sockets over loopback.
>
> If skb->napi_id is a sender_cpu, why should we prevent overwriting the
> sk_napi_id with it, knowing that busy polling will simply ignore the
> invalid value ?

We are not checking for invalid VALUE(< NR_CPUs) in busy_poll,
Non-zero sk->napi_id is considered valid.

If we don't want to add this check while setting sk->sk_napi_Id, we 
could change the
check in ep_set_busy_poll_napi_id() to check for invalid value rather 
than non-zero value.

>
> Do not get me wrong :
>
> I simply try to understand why the test about napi_id validity is now
> done twice :
>
> 1) At the time we are writing into sk->sk_napi_id
>
> 2) At busy polling time when we read sk->sk_napi_id
>
>
>
Eric Dumazet March 16, 2017, 10:50 p.m. UTC | #4
On Thu, 2017-03-16 at 15:33 -0700, Alexander Duyck wrote:
> On Thu, Mar 16, 2017 at 3:05 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> > It is not clear why this patch is needed .
> >
> > What you describe here is the case we might receive packets for a socket
> > coming from different interfaces ?
> >
> > If skb->napi_id is a sender_cpu, why should we prevent overwriting the
> > sk_napi_id with it, knowing that busy polling will simply ignore the
> > invalid value ?
> >
> > Do not get me wrong :
> >
> > I simply try to understand why the test about napi_id validity is now
> > done twice :
> >
> > 1) At the time we are writing into sk->sk_napi_id
> 
> I would argue that this is the one piece we were missing.
> 
> > 2) At busy polling time when we read sk->sk_napi_id
> 
> Unless there was something recently added I don't think this was ever
> checked.  Instead we start digging into the hash looking for the ID
> that won't ever be there.  Maybe we should add something to napi_by_id
> that just returns NULL in these cases.

But this is exactly what should happen.

For invalid ID, we return NULL from napi_by_id()

No need to add code for that, since the function is meant to deal with
valid cases.



> On top of that I think there end up being several spots where once we
> lock in a non-NAPI ID it is stuck such as the sk_mark_napi_id_once
> call.  I figure we are better off locking in an actual NAPI ID rather
> than getting a sender_cpu stuck in there.

Are you referring to sk_mark_napi_id_once() ?

Since this is only used by UDP, I would be OK to avoid the 'locking' for
'sender_cpu'  ids.
Alexander Duyck March 17, 2017, 2:40 a.m. UTC | #5
On Thu, Mar 16, 2017 at 3:50 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2017-03-16 at 15:33 -0700, Alexander Duyck wrote:
>> On Thu, Mar 16, 2017 at 3:05 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>> > It is not clear why this patch is needed .
>> >
>> > What you describe here is the case we might receive packets for a socket
>> > coming from different interfaces ?
>> >
>> > If skb->napi_id is a sender_cpu, why should we prevent overwriting the
>> > sk_napi_id with it, knowing that busy polling will simply ignore the
>> > invalid value ?
>> >
>> > Do not get me wrong :
>> >
>> > I simply try to understand why the test about napi_id validity is now
>> > done twice :
>> >
>> > 1) At the time we are writing into sk->sk_napi_id
>>
>> I would argue that this is the one piece we were missing.
>>
>> > 2) At busy polling time when we read sk->sk_napi_id
>>
>> Unless there was something recently added I don't think this was ever
>> checked.  Instead we start digging into the hash looking for the ID
>> that won't ever be there.  Maybe we should add something to napi_by_id
>> that just returns NULL in these cases.
>
> But this is exactly what should happen.
>
> For invalid ID, we return NULL from napi_by_id()
>
> No need to add code for that, since the function is meant to deal with
> valid cases.

I don't know.  My concern here is about the cost of going through all
that code just for something that we know shouldn't be valid.  If
nothing else I might update sk_can_busy_loop so that it doesn't try
busy looping on a sk_napi_id that is NR_CPU or less.

>> On top of that I think there end up being several spots where once we
>> lock in a non-NAPI ID it is stuck such as the sk_mark_napi_id_once
>> call.  I figure we are better off locking in an actual NAPI ID rather
>> than getting a sender_cpu stuck in there.
>
> Are you referring to sk_mark_napi_id_once() ?
>
> Since this is only used by UDP, I would be OK to avoid the 'locking' for
> 'sender_cpu'  ids.

What I probably can do is go through and replace all the spots where
we where checking for sk_napi_id being 0, and instead replace it with
a check against NR_CPUS.
Eric Dumazet March 17, 2017, 2:55 a.m. UTC | #6
On Thu, 2017-03-16 at 19:40 -0700, Alexander Duyck wrote:

> I don't know.  My concern here is about the cost of going through all
> that code just for something that we know shouldn't be valid.  If
> nothing else I might update sk_can_busy_loop so that it doesn't try
> busy looping on a sk_napi_id that is NR_CPU or less.

But why would that be a win ?

if napi_by_id() returns NULL, we immediately give up, (goto out;)

So why should we add a code that will add something that will not be
useful for the vast majority of the cases where the ID is valid and we
need to do the hash look up ?

Is libc trying to avoid doing syscalls like close(-1) ?
Eric Dumazet March 17, 2017, 2:57 a.m. UTC | #7
On Thu, 2017-03-16 at 19:40 -0700, Alexander Duyck wrote:

> What I probably can do is go through and replace all the spots where
> we where checking for sk_napi_id being 0, and instead replace it with
> a check against NR_CPUS.

This seems a good idea.
Alexander Duyck March 17, 2017, 2:59 a.m. UTC | #8
On Thu, Mar 16, 2017 at 7:57 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2017-03-16 at 19:40 -0700, Alexander Duyck wrote:
>
>> What I probably can do is go through and replace all the spots where
>> we where checking for sk_napi_id being 0, and instead replace it with
>> a check against NR_CPUS.
>
> This seems a good idea.

Right.  This is the path I am planning to go with.  It will require
about the same amount of code change but replaces those checks.  I
just have to make sure I catch all the spots where we were checking it
for 0 but that shouldn't be too difficult of an issue.
diff mbox

Patch

diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index c0452de83086..67991635953e 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -116,7 +116,8 @@  static inline bool sk_busy_loop(struct sock *sk, int nonblock)
 static inline void sk_mark_napi_id(struct sock *sk, const struct sk_buff *skb)
 {
 #ifdef CONFIG_NET_RX_BUSY_POLL
-	sk->sk_napi_id = skb->napi_id;
+	if (skb->napi_id > (u32)NR_CPUS)
+		sk->sk_napi_id = skb->napi_id;
 #endif
 }
 
@@ -125,7 +126,7 @@  static inline void sk_mark_napi_id_once(struct sock *sk,
 					const struct sk_buff *skb)
 {
 #ifdef CONFIG_NET_RX_BUSY_POLL
-	if (!sk->sk_napi_id)
+	if (!sk->sk_napi_id && (skb->napi_id > (u32)NR_CPUS))
 		sk->sk_napi_id = skb->napi_id;
 #endif
 }