diff mbox

[net-next,4/7] packet: rollover lock contention avoidance

Message ID 1430941450.14545.84.camel@edumazet-glaptop2.roam.corp.google.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet May 6, 2015, 7:44 p.m. UTC
On Wed, 2015-05-06 at 14:27 -0400, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 

> @@ -3718,6 +3726,10 @@ static unsigned int packet_poll(struct file *file, struct socket *sock,
>  			mask |= POLLOUT | POLLWRNORM;
>  	}
>  	spin_unlock_bh(&sk->sk_write_queue.lock);
> +
> +	if (po->pressure && !(mask & POLLIN))
> +		xchg(&po->pressure, 0);
> +
>  	return mask;

This look racy to me : several threads could run here, and a thread
could remove pressure just set by another one.

Also waiting for queue being completely empty before releasing pressure
might depend on scheduling to utilize queue shortly.

(We usually use max_occupancy/2 thresholds)

Maybe this would be better, but please check.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Willem de Bruijn May 6, 2015, 9:05 p.m. UTC | #1
On Wed, May 6, 2015 at 3:44 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2015-05-06 at 14:27 -0400, Willem de Bruijn wrote:
>> From: Willem de Bruijn <willemb@google.com>
>>
>
>> @@ -3718,6 +3726,10 @@ static unsigned int packet_poll(struct file *file, struct socket *sock,
>>                       mask |= POLLOUT | POLLWRNORM;
>>       }
>>       spin_unlock_bh(&sk->sk_write_queue.lock);
>> +
>> +     if (po->pressure && !(mask & POLLIN))
>> +             xchg(&po->pressure, 0);
>> +
>>       return mask;
>
> This look racy to me : several threads could run here, and a thread
> could remove pressure just set by another one.
>
> Also waiting for queue being completely empty before releasing pressure
> might depend on scheduling to utilize queue shortly.
>
> (We usually use max_occupancy/2 thresholds)

Yes, half would be better. I could not figure out a way to test anything
besides empty without taking the receive lock. But as your change points
out, packet_poll already takes that lock. I can call a lockless variant of
packet_rcv_has_room right there, then (iff po->pressure).

> Maybe this would be better, but please check.
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 5102c3cc4eec..fab59f8bb336 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -3694,6 +3694,8 @@ static unsigned int packet_poll(struct file *file, struct socket *sock,
>                         TP_STATUS_KERNEL))
>                         mask |= POLLIN | POLLRDNORM;
>         }
> +       if (po->pressure && !(mask & POLLIN))
> +               xchg(&po->pressure, 0);
>         spin_unlock_bh(&sk->sk_receive_queue.lock);
>         spin_lock_bh(&sk->sk_write_queue.lock);
>         if (po->tx_ring.pg_vec) {
>

That would fix it for rings. I'll move the check there.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 5102c3cc4eec..fab59f8bb336 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3694,6 +3694,8 @@  static unsigned int packet_poll(struct file *file, struct socket *sock,
                        TP_STATUS_KERNEL))
                        mask |= POLLIN | POLLRDNORM;
        }
+       if (po->pressure && !(mask & POLLIN))
+               xchg(&po->pressure, 0);
        spin_unlock_bh(&sk->sk_receive_queue.lock);
        spin_lock_bh(&sk->sk_write_queue.lock);
        if (po->tx_ring.pg_vec) {