Message ID | 1271300513-9995-1-git-send-email-xiaosuo@gmail.com |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, Apr 15, 2010 at 11:01 AM, Changli Gao <xiaosuo@gmail.com> wrote: > fix softnet_stat > > Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context > without any protection. And enqueue_to_backlog should update the netdev_rx_stat > of the target CPU. > > This patch changes the meaning of softnet_data.total to the number of packets > processed, not includes dropped, in order to avoid it sharing between IRQ and > SoftIRQ. And softnet_data.dropped is protected in the corresponding > input_pkt_queue.lock, if RPS is enabled. > I forget a comment: This patch also fixed a bug: packets received by enqueue_to_backlog() was counted twice: one in enqueue_to_backlog(), and one in __netif_receive_skb(), although they may not on the same CPUs.
Le jeudi 15 avril 2010 à 11:01 +0800, Changli Gao a écrit : > fix softnet_stat > > Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context > without any protection. And enqueue_to_backlog should update the netdev_rx_stat > of the target CPU. > > This patch changes the meaning of softnet_data.total to the number of packets > processed, not includes dropped, in order to avoid it sharing between IRQ and > SoftIRQ. And softnet_data.dropped is protected in the corresponding > input_pkt_queue.lock, if RPS is enabled. > > Signed-off-by: Changli Gao <xiaosuo@gmail.com> > ---- > net/core/dev.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > diff --git a/net/core/dev.c b/net/core/dev.c > index a10a216..b38a991 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -2336,7 +2336,6 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu) > queue = &per_cpu(softnet_data, cpu); > > local_irq_save(flags); > - __get_cpu_var(netdev_rx_stat).total++; I think this was nice, because we can see number of frames received by this cpu, before giving them to another cpu. Maybe we want a new counter ? __get_cpu_var(netdev_rx_stat).inpackets++; > > rps_lock(queue); > if (queue->input_pkt_queue.qlen <= netdev_max_backlog) { > @@ -2366,9 +2365,9 @@ enqueue: > goto enqueue; > } > > + per_cpu(netdev_rx_stat, cpu).dropped++; This is a bit expensive, and could be a queue->rx_stat.dropped++ if netdev_rx_stat is moved inside queue structure. > rps_unlock(queue); > > - __get_cpu_var(netdev_rx_stat).dropped++; > local_irq_restore(flags); > > kfree_skb(skb); > @@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v) > struct netif_rx_stats *s = v; > > seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n", > - s->total, s->dropped, s->time_squeeze, 0, > + s->total + s->dropped, s->dropped, s->time_squeeze, 0, This is wrong I believe... I prefer to add a new column, s->inpackets ? > 0, 0, 0, 0, /* was fastroute */ > s->cpu_collision, s->received_rps); > return 0; > -- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 15 avril 2010 à 05:28 +0200, Eric Dumazet a écrit : > > > > + per_cpu(netdev_rx_stat, cpu).dropped++; > > This is a bit expensive, and could be a queue->rx_stat.dropped++ > if netdev_rx_stat is moved inside queue structure. Hmm, this is not really needed, this event is the slow path and should almost never happen. This part of your fix is fine. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 15, 2010 at 11:28 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> ---- >> net/core/dev.c | 5 ++--- >> 1 file changed, 2 insertions(+), 3 deletions(-) >> diff --git a/net/core/dev.c b/net/core/dev.c >> index a10a216..b38a991 100644 >> --- a/net/core/dev.c >> +++ b/net/core/dev.c >> @@ -2336,7 +2336,6 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu) >> queue = &per_cpu(softnet_data, cpu); >> >> local_irq_save(flags); >> - __get_cpu_var(netdev_rx_stat).total++; > > I think this was nice, because we can see number of frames received by > this cpu, before giving them to another cpu. > :( Packets will be counted again in __net_receive_skb(). Now the meaning of total is confusing. > Maybe we want a new counter ? > > __get_cpu_var(netdev_rx_stat).inpackets++; > How about replacing total with received? >> >> rps_lock(queue); >> if (queue->input_pkt_queue.qlen <= netdev_max_backlog) { >> @@ -2366,9 +2365,9 @@ enqueue: >> goto enqueue; >> } >> >> + per_cpu(netdev_rx_stat, cpu).dropped++; > > This is a bit expensive, and could be a queue->rx_stat.dropped++ > if netdev_rx_stat is moved inside queue structure. > It will make softnet_data larger.? > >> rps_unlock(queue); >> >> - __get_cpu_var(netdev_rx_stat).dropped++; >> local_irq_restore(flags); >> >> kfree_skb(skb); >> @@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v) >> struct netif_rx_stats *s = v; >> >> seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n", >> - s->total, s->dropped, s->time_squeeze, 0, >> + s->total + s->dropped, s->dropped, s->time_squeeze, 0, > > This is wrong I believe... I prefer to add a new column, s->inpackets ? If I rename total to received, is it still confusing?
On Thu, Apr 15, 2010 at 11:40 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le jeudi 15 avril 2010 à 05:28 +0200, Eric Dumazet a écrit : >> > >> > + per_cpu(netdev_rx_stat, cpu).dropped++; >> >> This is a bit expensive, and could be a queue->rx_stat.dropped++ >> if netdev_rx_stat is moved inside queue structure. > > Hmm, this is not really needed, this event is the slow path and should > almost never happen. This part of your fix is fine. > It there any document about /proc/net/softnet_stat?
On Thu, Apr 15, 2010 at 11:28 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > I think this was nice, because we can see number of frames received by > this cpu, before giving them to another cpu. > > Maybe we want a new counter ? > > __get_cpu_var(netdev_rx_stat).inpackets++; > > >> >> rps_lock(queue); >> if (queue->input_pkt_queue.qlen <= netdev_max_backlog) { >> @@ -2366,9 +2365,9 @@ enqueue: >> goto enqueue; >> } >> >> + per_cpu(netdev_rx_stat, cpu).dropped++; > > This is a bit expensive, and could be a queue->rx_stat.dropped++ > if netdev_rx_stat is moved inside queue structure. > > >> rps_unlock(queue); >> >> - __get_cpu_var(netdev_rx_stat).dropped++; >> local_irq_restore(flags); >> >> kfree_skb(skb); >> @@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v) >> struct netif_rx_stats *s = v; >> >> seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n", >> - s->total, s->dropped, s->time_squeeze, 0, >> + s->total + s->dropped, s->dropped, s->time_squeeze, 0, > > This is wrong I believe... I prefer to add a new column, s->inpackets ? > the first column total should be the number of the packets, which are processed by this CPU before RPS involved. If RPS is enabled, it should keep the old meaning. If new statistics data is needed by RPS, we would add it in another patch.
Le jeudi 15 avril 2010 à 11:41 +0800, Changli Gao a écrit : > On Thu, Apr 15, 2010 at 11:28 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > >> ---- > >> net/core/dev.c | 5 ++--- > >> 1 file changed, 2 insertions(+), 3 deletions(-) > >> diff --git a/net/core/dev.c b/net/core/dev.c > >> index a10a216..b38a991 100644 > >> --- a/net/core/dev.c > >> +++ b/net/core/dev.c > >> @@ -2336,7 +2336,6 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu) > >> queue = &per_cpu(softnet_data, cpu); > >> > >> local_irq_save(flags); > >> - __get_cpu_var(netdev_rx_stat).total++; > > > > I think this was nice, because we can see number of frames received by > > this cpu, before giving them to another cpu. > > > > :( Packets will be counted again in __net_receive_skb(). Now the > meaning of total is confusing. Yes, this should be documented somewhere :) > > > Maybe we want a new counter ? > > > > __get_cpu_var(netdev_rx_stat).inpackets++; > > > > How about replacing total with received? > > >> > >> rps_lock(queue); > >> if (queue->input_pkt_queue.qlen <= netdev_max_backlog) { > >> @@ -2366,9 +2365,9 @@ enqueue: > >> goto enqueue; > >> } > >> > >> + per_cpu(netdev_rx_stat, cpu).dropped++; > > > > This is a bit expensive, and could be a queue->rx_stat.dropped++ > > if netdev_rx_stat is moved inside queue structure. > > > > It will make softnet_data larger.? We move a per_cpu struture inside another one. No extra bytes needed. > > > > >> rps_unlock(queue); > >> > >> - __get_cpu_var(netdev_rx_stat).dropped++; > >> local_irq_restore(flags); > >> > >> kfree_skb(skb); > >> @@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v) > >> struct netif_rx_stats *s = v; > >> > >> seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n", > >> - s->total, s->dropped, s->time_squeeze, 0, > >> + s->total + s->dropped, s->dropped, s->time_squeeze, 0, > > > > This is wrong I believe... I prefer to add a new column, s->inpackets ? > > If I rename total to received, is it still confusing? Keep in mind /proc/net/softnet_stat is an ABI, and first column had the meaning : number of packets processed by this cpu. But this was pre RPS time. Now we have a two phases process. This makes sense to : let 'total' be the counter of packets processed in upper levels (IP/TCP stack). As /proc/net/softnet_stat has no header, the 'total' name is not important and could be changed. add a new column (after the existing ones in /proc file to keep ABI compatibility) with the meaning : number of packets pre-processed before network stacks. total : number of packets processed in upper layers (IP stack for example) dropped: number of packets dropped because this cpu queue was full time_squeeze: number of time squeeze of this cpu cpu_collision:... received_rps: number of time this cpu received a RPS signal from another cpu inpackets : number of packets processed in layer 1 by this cpu (RPS level) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 15 avril 2010 à 05:40 +0200, Eric Dumazet a écrit : > Le jeudi 15 avril 2010 à 05:28 +0200, Eric Dumazet a écrit : > > > > > > + per_cpu(netdev_rx_stat, cpu).dropped++; > > > > This is a bit expensive, and could be a queue->rx_stat.dropped++ > > if netdev_rx_stat is moved inside queue structure. > > Hmm, this is not really needed, this event is the slow path and should > almost never happen. This part of your fix is fine. > > > - Oh well, I revert this comment, we need a mutual exclusion between cpus if we dont want to miss some increments, even in a slow path. __get_cpu_var(netdev_rx_stat).field++; is safe because updated only by this cpu. Once you use per_cpu(netdev_rx_stat, cpu).field++; you need a synchronization since this is not atomic. so queue->dropped++ is safer and faster too (inside the rps_lock section) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 15, 2010 at 12:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le jeudi 15 avril 2010 à 05:40 +0200, Eric Dumazet a écrit : > > Oh well, I revert this comment, we need a mutual exclusion between cpus > if we dont want to miss some increments, even in a slow path. > > __get_cpu_var(netdev_rx_stat).field++; is safe because updated only by > this cpu. > > Once you use per_cpu(netdev_rx_stat, cpu).field++; you need a > synchronization since this is not atomic. > rps_unlock() is a synchronization if needed. > so queue->dropped++ is safer and faster too (inside the rps_lock > section) > It involves more lines of code, moving netdev_rx_stat into softdata.
diff --git a/net/core/dev.c b/net/core/dev.c index a10a216..b38a991 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2336,7 +2336,6 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu) queue = &per_cpu(softnet_data, cpu); local_irq_save(flags); - __get_cpu_var(netdev_rx_stat).total++; rps_lock(queue); if (queue->input_pkt_queue.qlen <= netdev_max_backlog) { @@ -2366,9 +2365,9 @@ enqueue: goto enqueue; } + per_cpu(netdev_rx_stat, cpu).dropped++; rps_unlock(queue); - __get_cpu_var(netdev_rx_stat).dropped++; local_irq_restore(flags); kfree_skb(skb); @@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v) struct netif_rx_stats *s = v; seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n", - s->total, s->dropped, s->time_squeeze, 0, + s->total + s->dropped, s->dropped, s->time_squeeze, 0, 0, 0, 0, 0, /* was fastroute */ s->cpu_collision, s->received_rps); return 0;
fix softnet_stat Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context without any protection. And enqueue_to_backlog should update the netdev_rx_stat of the target CPU. This patch changes the meaning of softnet_data.total to the number of packets processed, not includes dropped, in order to avoid it sharing between IRQ and SoftIRQ. And softnet_data.dropped is protected in the corresponding input_pkt_queue.lock, if RPS is enabled. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- net/core/dev.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html