diff mbox

net: fix softnet_stat

Message ID 1271300513-9995-1-git-send-email-xiaosuo@gmail.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Changli Gao April 15, 2010, 3:01 a.m. UTC
fix softnet_stat

Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context
without any protection. And enqueue_to_backlog should update the netdev_rx_stat
of the target CPU.

This patch changes the meaning of softnet_data.total to the number of packets
processed, not includes dropped, in order to avoid it sharing between IRQ and
SoftIRQ. And softnet_data.dropped is protected in the corresponding
input_pkt_queue.lock, if RPS is enabled.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 net/core/dev.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Changli Gao April 15, 2010, 3:08 a.m. UTC | #1
On Thu, Apr 15, 2010 at 11:01 AM, Changli Gao <xiaosuo@gmail.com> wrote:
> fix softnet_stat
>
> Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context
> without any protection. And enqueue_to_backlog should update the netdev_rx_stat
> of the target CPU.
>
> This patch changes the meaning of softnet_data.total to the number of packets
> processed, not includes dropped, in order to avoid it sharing between IRQ and
> SoftIRQ. And softnet_data.dropped is protected in the corresponding
> input_pkt_queue.lock, if RPS is enabled.
>

I forget a comment:

This patch also fixed a bug: packets received by enqueue_to_backlog()
was counted twice: one in enqueue_to_backlog(), and one in
__netif_receive_skb(), although they may not on the same CPUs.
Eric Dumazet April 15, 2010, 3:28 a.m. UTC | #2
Le jeudi 15 avril 2010 à 11:01 +0800, Changli Gao a écrit :
> fix softnet_stat
> 
> Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context
> without any protection. And enqueue_to_backlog should update the netdev_rx_stat
> of the target CPU.
> 
> This patch changes the meaning of softnet_data.total to the number of packets
> processed, not includes dropped, in order to avoid it sharing between IRQ and
> SoftIRQ. And softnet_data.dropped is protected in the corresponding
> input_pkt_queue.lock, if RPS is enabled.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ----
>  net/core/dev.c |    5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> diff --git a/net/core/dev.c b/net/core/dev.c
> index a10a216..b38a991 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2336,7 +2336,6 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu)
>  	queue = &per_cpu(softnet_data, cpu);
>  
>  	local_irq_save(flags);
> -	__get_cpu_var(netdev_rx_stat).total++;

I think this was nice, because we can see number of frames received by
this cpu, before giving them to another cpu.

Maybe we want a new counter ?

	__get_cpu_var(netdev_rx_stat).inpackets++;


>  
>  	rps_lock(queue);
>  	if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
> @@ -2366,9 +2365,9 @@ enqueue:
>  		goto enqueue;
>  	}
>  
> +	per_cpu(netdev_rx_stat, cpu).dropped++;

This is a bit expensive, and could be a queue->rx_stat.dropped++
if netdev_rx_stat is moved inside queue structure.


>  	rps_unlock(queue);
>  
> -	__get_cpu_var(netdev_rx_stat).dropped++;
>  	local_irq_restore(flags);
>  
>  	kfree_skb(skb);
> @@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
>  	struct netif_rx_stats *s = v;
>  
>  	seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
> -		   s->total, s->dropped, s->time_squeeze, 0,
> +		   s->total + s->dropped, s->dropped, s->time_squeeze, 0,

This is wrong I believe... I prefer to add a new column, s->inpackets ?

>  		   0, 0, 0, 0, /* was fastroute */
>  		   s->cpu_collision, s->received_rps);
>  	return 0;
> --



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 15, 2010, 3:40 a.m. UTC | #3
Le jeudi 15 avril 2010 à 05:28 +0200, Eric Dumazet a écrit :
> >  
> > +	per_cpu(netdev_rx_stat, cpu).dropped++;
> 
> This is a bit expensive, and could be a queue->rx_stat.dropped++
> if netdev_rx_stat is moved inside queue structure.

Hmm, this is not really needed, this event is the slow path and should
almost never happen. This part of your fix is fine.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Changli Gao April 15, 2010, 3:41 a.m. UTC | #4
On Thu, Apr 15, 2010 at 11:28 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> ----
>>  net/core/dev.c |    5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index a10a216..b38a991 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -2336,7 +2336,6 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu)
>>       queue = &per_cpu(softnet_data, cpu);
>>
>>       local_irq_save(flags);
>> -     __get_cpu_var(netdev_rx_stat).total++;
>
> I think this was nice, because we can see number of frames received by
> this cpu, before giving them to another cpu.
>

:( Packets will be counted again in __net_receive_skb(). Now the
meaning of total is confusing.

> Maybe we want a new counter ?
>
>        __get_cpu_var(netdev_rx_stat).inpackets++;
>

How about replacing total with received?

>>
>>       rps_lock(queue);
>>       if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
>> @@ -2366,9 +2365,9 @@ enqueue:
>>               goto enqueue;
>>       }
>>
>> +     per_cpu(netdev_rx_stat, cpu).dropped++;
>
> This is a bit expensive, and could be a queue->rx_stat.dropped++
> if netdev_rx_stat is moved inside queue structure.
>

It will make softnet_data larger.?

>
>>       rps_unlock(queue);
>>
>> -     __get_cpu_var(netdev_rx_stat).dropped++;
>>       local_irq_restore(flags);
>>
>>       kfree_skb(skb);
>> @@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
>>       struct netif_rx_stats *s = v;
>>
>>       seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
>> -                s->total, s->dropped, s->time_squeeze, 0,
>> +                s->total + s->dropped, s->dropped, s->time_squeeze, 0,
>
> This is wrong I believe... I prefer to add a new column, s->inpackets ?

If I rename total to received, is it still confusing?
Changli Gao April 15, 2010, 3:45 a.m. UTC | #5
On Thu, Apr 15, 2010 at 11:40 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 15 avril 2010 à 05:28 +0200, Eric Dumazet a écrit :
>> >
>> > +   per_cpu(netdev_rx_stat, cpu).dropped++;
>>
>> This is a bit expensive, and could be a queue->rx_stat.dropped++
>> if netdev_rx_stat is moved inside queue structure.
>
> Hmm, this is not really needed, this event is the slow path and should
> almost never happen. This part of your fix is fine.
>

It there any document about /proc/net/softnet_stat?
Changli Gao April 15, 2010, 4:01 a.m. UTC | #6
On Thu, Apr 15, 2010 at 11:28 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> I think this was nice, because we can see number of frames received by
> this cpu, before giving them to another cpu.
>
> Maybe we want a new counter ?
>
>        __get_cpu_var(netdev_rx_stat).inpackets++;
>
>
>>
>>       rps_lock(queue);
>>       if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
>> @@ -2366,9 +2365,9 @@ enqueue:
>>               goto enqueue;
>>       }
>>
>> +     per_cpu(netdev_rx_stat, cpu).dropped++;
>
> This is a bit expensive, and could be a queue->rx_stat.dropped++
> if netdev_rx_stat is moved inside queue structure.
>
>
>>       rps_unlock(queue);
>>
>> -     __get_cpu_var(netdev_rx_stat).dropped++;
>>       local_irq_restore(flags);
>>
>>       kfree_skb(skb);
>> @@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
>>       struct netif_rx_stats *s = v;
>>
>>       seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
>> -                s->total, s->dropped, s->time_squeeze, 0,
>> +                s->total + s->dropped, s->dropped, s->time_squeeze, 0,
>
> This is wrong I believe... I prefer to add a new column, s->inpackets ?
>

the first column total should be the number of the packets, which are
processed by this CPU before RPS involved. If RPS is enabled, it
should keep the old meaning. If new statistics data is needed by RPS,
we would add it in another patch.
Eric Dumazet April 15, 2010, 4:03 a.m. UTC | #7
Le jeudi 15 avril 2010 à 11:41 +0800, Changli Gao a écrit :
> On Thu, Apr 15, 2010 at 11:28 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> ----
> >>  net/core/dev.c |    5 ++---
> >>  1 file changed, 2 insertions(+), 3 deletions(-)
> >> diff --git a/net/core/dev.c b/net/core/dev.c
> >> index a10a216..b38a991 100644
> >> --- a/net/core/dev.c
> >> +++ b/net/core/dev.c
> >> @@ -2336,7 +2336,6 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu)
> >>       queue = &per_cpu(softnet_data, cpu);
> >>
> >>       local_irq_save(flags);
> >> -     __get_cpu_var(netdev_rx_stat).total++;
> >
> > I think this was nice, because we can see number of frames received by
> > this cpu, before giving them to another cpu.
> >
> 
> :( Packets will be counted again in __net_receive_skb(). Now the
> meaning of total is confusing.

Yes, this should be documented somewhere :)

> 
> > Maybe we want a new counter ?
> >
> >        __get_cpu_var(netdev_rx_stat).inpackets++;
> >
> 
> How about replacing total with received?
> 
> >>
> >>       rps_lock(queue);
> >>       if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
> >> @@ -2366,9 +2365,9 @@ enqueue:
> >>               goto enqueue;
> >>       }
> >>
> >> +     per_cpu(netdev_rx_stat, cpu).dropped++;
> >
> > This is a bit expensive, and could be a queue->rx_stat.dropped++
> > if netdev_rx_stat is moved inside queue structure.
> >
> 
> It will make softnet_data larger.?

We move a per_cpu struture inside another one. No extra bytes needed.

> 
> >
> >>       rps_unlock(queue);
> >>
> >> -     __get_cpu_var(netdev_rx_stat).dropped++;
> >>       local_irq_restore(flags);
> >>
> >>       kfree_skb(skb);
> >> @@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
> >>       struct netif_rx_stats *s = v;
> >>
> >>       seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
> >> -                s->total, s->dropped, s->time_squeeze, 0,
> >> +                s->total + s->dropped, s->dropped, s->time_squeeze, 0,
> >
> > This is wrong I believe... I prefer to add a new column, s->inpackets ?
> 
> If I rename total to received, is it still confusing?

Keep in mind /proc/net/softnet_stat is an ABI, and first column had the
meaning : number of packets processed by this cpu.

But this was pre RPS time. Now we have a two phases process.


This makes sense to :

let 'total' be the counter of packets processed in upper levels (IP/TCP
stack). As /proc/net/softnet_stat has no header, the 'total' name is not
important and could be changed.

add a new column (after the existing ones in /proc file to keep ABI
compatibility) with the meaning : number of packets pre-processed before
network stacks.

total : number of packets processed in upper layers (IP stack for
example)
dropped: number of packets dropped because this cpu queue was full
time_squeeze: number of time squeeze of this cpu
cpu_collision:...
received_rps: number of time this cpu received a RPS signal from another
cpu
inpackets : number of packets processed in layer 1 by this cpu (RPS
level)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 15, 2010, 4:14 a.m. UTC | #8
Le jeudi 15 avril 2010 à 05:40 +0200, Eric Dumazet a écrit :
> Le jeudi 15 avril 2010 à 05:28 +0200, Eric Dumazet a écrit :
> > >  
> > > +	per_cpu(netdev_rx_stat, cpu).dropped++;
> > 
> > This is a bit expensive, and could be a queue->rx_stat.dropped++
> > if netdev_rx_stat is moved inside queue structure.
> 
> Hmm, this is not really needed, this event is the slow path and should
> almost never happen. This part of your fix is fine.
> 
> 
> -

Oh well, I revert this comment, we need a mutual exclusion between cpus
if we dont want to miss some increments, even in a slow path.

__get_cpu_var(netdev_rx_stat).field++; is safe because updated only by
this cpu.

Once you use per_cpu(netdev_rx_stat, cpu).field++; you need a
synchronization since this is not atomic.

so queue->dropped++ is safer and faster too (inside the rps_lock
section)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Changli Gao April 15, 2010, 4:22 a.m. UTC | #9
On Thu, Apr 15, 2010 at 12:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 15 avril 2010 à 05:40 +0200, Eric Dumazet a écrit :
>
> Oh well, I revert this comment, we need a mutual exclusion between cpus
> if we dont want to miss some increments, even in a slow path.
>
> __get_cpu_var(netdev_rx_stat).field++; is safe because updated only by
> this cpu.
>
> Once you use per_cpu(netdev_rx_stat, cpu).field++; you need a
> synchronization since this is not atomic.
>

rps_unlock() is a synchronization if needed.

> so queue->dropped++ is safer and faster too (inside the rps_lock
> section)
>

It involves more lines of code, moving netdev_rx_stat into softdata.
diff mbox

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index a10a216..b38a991 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2336,7 +2336,6 @@  static int enqueue_to_backlog(struct sk_buff *skb, int cpu)
 	queue = &per_cpu(softnet_data, cpu);
 
 	local_irq_save(flags);
-	__get_cpu_var(netdev_rx_stat).total++;
 
 	rps_lock(queue);
 	if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
@@ -2366,9 +2365,9 @@  enqueue:
 		goto enqueue;
 	}
 
+	per_cpu(netdev_rx_stat, cpu).dropped++;
 	rps_unlock(queue);
 
-	__get_cpu_var(netdev_rx_stat).dropped++;
 	local_irq_restore(flags);
 
 	kfree_skb(skb);
@@ -3565,7 +3564,7 @@  static int softnet_seq_show(struct seq_file *seq, void *v)
 	struct netif_rx_stats *s = v;
 
 	seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
-		   s->total, s->dropped, s->time_squeeze, 0,
+		   s->total + s->dropped, s->dropped, s->time_squeeze, 0,
 		   0, 0, 0, 0, /* was fastroute */
 		   s->cpu_collision, s->received_rps);
 	return 0;