Message ID | 1338971724.2760.3913.camel@edumazet-glaptop |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, 2012-06-06 at 10:35 +0200, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race > on 32bit arches. > > We must use separate syncp for rx and tx path as they can be run at the > same time on different cpus. Thus one sequence increment can be lost and > readers spin forever. > > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Stephen Hemminger <shemminger@vyatta.com> > Cc: Michael S. Tsirkin <mst@redhat.com> > Cc: Jason Wang <jasowang@redhat.com> > --- Just to make clear : even using percpu stats/syncp, we have no guarantee that write_seqcount_begin() is done with one instruction. [1] It is OK on x86 if "incl" instruction is generated by the compiler, but on a RISC cpu, the "load memory,%reg ; inc %reg ; store %reg,memory" can be interrupted. So if you are 100% sure all paths are safe against preemption/BH, then this patch is not needed, but a big comment in the code would avoid adding possible races in the future. [1] If done with one instruction, we still have a race, since a reader might see an even sequence and conclude no writer is inside the critical section. So read values could be wrong. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/06/2012 04:45 PM, Eric Dumazet wrote: > On Wed, 2012-06-06 at 10:35 +0200, Eric Dumazet wrote: >> From: Eric Dumazet<edumazet@google.com> >> >> commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race >> on 32bit arches. >> >> We must use separate syncp for rx and tx path as they can be run at the >> same time on different cpus. Thus one sequence increment can be lost and >> readers spin forever. >> >> Signed-off-by: Eric Dumazet<edumazet@google.com> >> Cc: Stephen Hemminger<shemminger@vyatta.com> >> Cc: Michael S. Tsirkin<mst@redhat.com> >> Cc: Jason Wang<jasowang@redhat.com> >> --- > Just to make clear : even using percpu stats/syncp, we have no guarantee > that write_seqcount_begin() is done with one instruction. [1] > > It is OK on x86 if "incl" instruction is generated by the compiler, but > on a RISC cpu, the "load memory,%reg ; inc %reg ; store %reg,memory" can > be interrupted. > > So if you are 100% sure all paths are safe against preemption/BH, then > this patch is not needed, but a big comment in the code would avoid > adding possible races in the future. Thanks for explaing, current virtio-net is safe I think. But the patch is still needed as my patch would update the statistics in irq. > > [1] If done with one instruction, we still have a race, since a reader > might see an even sequence and conclude no writer is inside the critical > section. So read values could be wrong. > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 06, 2012 at 10:45:41AM +0200, Eric Dumazet wrote: > On Wed, 2012-06-06 at 10:35 +0200, Eric Dumazet wrote: > > From: Eric Dumazet <edumazet@google.com> > > > > commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race > > on 32bit arches. > > > > We must use separate syncp for rx and tx path as they can be run at the > > same time on different cpus. Thus one sequence increment can be lost and > > readers spin forever. > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > Cc: Stephen Hemminger <shemminger@vyatta.com> > > Cc: Michael S. Tsirkin <mst@redhat.com> > > Cc: Jason Wang <jasowang@redhat.com> > > --- > > Just to make clear : even using percpu stats/syncp, we have no guarantee > that write_seqcount_begin() is done with one instruction. [1] > > It is OK on x86 if "incl" instruction is generated by the compiler, but > on a RISC cpu, the "load memory,%reg ; inc %reg ; store %reg,memory" can > be interrupted. > > So if you are 100% sure all paths are safe against preemption/BH, then > this patch is not needed, but a big comment in the code would avoid > adding possible races in the future. We currently do all stats either on napi callback or from start_xmit callback. This makes them safe, yes? > [1] If done with one instruction, we still have a race, since a reader > might see an even sequence and conclude no writer is inside the critical > section. So read values could be wrong. > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 06 Jun 2012 10:45:41 +0200, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2012-06-06 at 10:35 +0200, Eric Dumazet wrote: > > From: Eric Dumazet <edumazet@google.com> > > > > commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race > > on 32bit arches. > > > > We must use separate syncp for rx and tx path as they can be run at the > > same time on different cpus. Thus one sequence increment can be lost and > > readers spin forever. > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > Cc: Stephen Hemminger <shemminger@vyatta.com> > > Cc: Michael S. Tsirkin <mst@redhat.com> > > Cc: Jason Wang <jasowang@redhat.com> > > --- > > Just to make clear : even using percpu stats/syncp, we have no guarantee > that write_seqcount_begin() is done with one instruction. [1] > > It is OK on x86 if "incl" instruction is generated by the compiler, but > on a RISC cpu, the "load memory,%reg ; inc %reg ; store %reg,memory" can > be interrupted. > > So if you are 100% sure all paths are safe against preemption/BH, then > this patch is not needed, but a big comment in the code would avoid > adding possible races in the future. Too fragile; let's keep them separate as per this patch. Acked-by: Rusty Russell <rusty@rustcorp.com.au> Thanks, Rusty. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jun 10, 2012 at 04:06:34PM +0930, Rusty Russell wrote: > On Wed, 06 Jun 2012 10:45:41 +0200, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > On Wed, 2012-06-06 at 10:35 +0200, Eric Dumazet wrote: > > > From: Eric Dumazet <edumazet@google.com> > > > > > > commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race > > > on 32bit arches. > > > > > > We must use separate syncp for rx and tx path as they can be run at the > > > same time on different cpus. Thus one sequence increment can be lost and > > > readers spin forever. > > > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > > Cc: Stephen Hemminger <shemminger@vyatta.com> > > > Cc: Michael S. Tsirkin <mst@redhat.com> > > > Cc: Jason Wang <jasowang@redhat.com> > > > --- > > > > Just to make clear : even using percpu stats/syncp, we have no guarantee > > that write_seqcount_begin() is done with one instruction. [1] > > > > It is OK on x86 if "incl" instruction is generated by the compiler, but > > on a RISC cpu, the "load memory,%reg ; inc %reg ; store %reg,memory" can > > be interrupted. > > > > So if you are 100% sure all paths are safe against preemption/BH, then > > this patch is not needed, but a big comment in the code would avoid > > adding possible races in the future. > > Too fragile; let's keep them separate as per this patch. > > Acked-by: Rusty Russell <rusty@rustcorp.com.au> > > Thanks, > Rusty. One question though: do we want to lay the structure out so that the rx sync structure precedes the rx counters?
On Sun, 2012-06-10 at 10:03 +0300, Michael S. Tsirkin wrote: > One question though: do we want to lay the structure > out so that the rx sync structure precedes the rx counters? > I am not sure its worth having holes in the structure, since its percpu data. That would be 8 bytes lost per cpu and per device. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jun 10, 2012 at 12:21:14PM +0200, Eric Dumazet wrote: > On Sun, 2012-06-10 at 10:03 +0300, Michael S. Tsirkin wrote: > > > One question though: do we want to lay the structure > > out so that the rx sync structure precedes the rx counters? > > > > I am not sure its worth having holes in the structure, since its percpu > data. > > That would be 8 bytes lost per cpu and per device. Right, I forgot it's per cpu. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 06, 2012 at 10:35:24AM +0200, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race > on 32bit arches. > > We must use separate syncp for rx and tx path as they can be run at the > same time on different cpus. Thus one sequence increment can be lost and > readers spin forever. > > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Stephen Hemminger <shemminger@vyatta.com> > Cc: Michael S. Tsirkin <mst@redhat.com> > Cc: Jason Wang <jasowang@redhat.com> I'm still thinking about moving tx to take a xmit lock long term, meanwhile this fix appears appropriate for 3.5. Acked-by: Michael S. Tsirkin <mst@redhat.com> Dave, can you pick this up pls? > --- > drivers/net/virtio_net.c | 19 ++++++++++++------- > 1 file changed, 12 insertions(+), 7 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 5214b1e..f18149a 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -42,7 +42,8 @@ module_param(gso, bool, 0444); > #define VIRTNET_DRIVER_VERSION "1.0.0" > > struct virtnet_stats { > - struct u64_stats_sync syncp; > + struct u64_stats_sync tx_syncp; > + struct u64_stats_sync rx_syncp; > u64 tx_bytes; > u64 tx_packets; > > @@ -300,10 +301,10 @@ static void receive_buf(struct net_device *dev, void *buf, unsigned int len) > > hdr = skb_vnet_hdr(skb); > > - u64_stats_update_begin(&stats->syncp); > + u64_stats_update_begin(&stats->rx_syncp); > stats->rx_bytes += skb->len; > stats->rx_packets++; > - u64_stats_update_end(&stats->syncp); > + u64_stats_update_end(&stats->rx_syncp); > > if (hdr->hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) { > pr_debug("Needs csum!\n"); > @@ -565,10 +566,10 @@ static unsigned int free_old_xmit_skbs(struct virtnet_info *vi) > while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) { > pr_debug("Sent skb %p\n", skb); > > - u64_stats_update_begin(&stats->syncp); > + u64_stats_update_begin(&stats->tx_syncp); > stats->tx_bytes += skb->len; > stats->tx_packets++; > - u64_stats_update_end(&stats->syncp); > + u64_stats_update_end(&stats->tx_syncp); > > tot_sgs += skb_vnet_hdr(skb)->num_sg; > dev_kfree_skb_any(skb); > @@ -703,12 +704,16 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev, > u64 tpackets, tbytes, rpackets, rbytes; > > do { > - start = u64_stats_fetch_begin(&stats->syncp); > + start = u64_stats_fetch_begin(&stats->tx_syncp); > tpackets = stats->tx_packets; > tbytes = stats->tx_bytes; > + } while (u64_stats_fetch_retry(&stats->tx_syncp, start)); > + > + do { > + start = u64_stats_fetch_begin(&stats->rx_syncp); > rpackets = stats->rx_packets; > rbytes = stats->rx_bytes; > - } while (u64_stats_fetch_retry(&stats->syncp, start)); > + } while (u64_stats_fetch_retry(&stats->rx_syncp, start)); > > tot->rx_packets += rpackets; > tot->tx_packets += tpackets; > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: "Michael S. Tsirkin" <mst@redhat.com> Date: Sun, 10 Jun 2012 13:25:12 +0300 > On Wed, Jun 06, 2012 at 10:35:24AM +0200, Eric Dumazet wrote: >> From: Eric Dumazet <edumazet@google.com> >> >> commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race >> on 32bit arches. >> >> We must use separate syncp for rx and tx path as they can be run at the >> same time on different cpus. Thus one sequence increment can be lost and >> readers spin forever. >> >> Signed-off-by: Eric Dumazet <edumazet@google.com> >> Cc: Stephen Hemminger <shemminger@vyatta.com> >> Cc: Michael S. Tsirkin <mst@redhat.com> >> Cc: Jason Wang <jasowang@redhat.com> > > I'm still thinking about moving tx to take a xmit lock long term, > meanwhile this fix appears appropriate for 3.5. > > Acked-by: Michael S. Tsirkin <mst@redhat.com> > > Dave, can you pick this up pls? Done, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 5214b1e..f18149a 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -42,7 +42,8 @@ module_param(gso, bool, 0444); #define VIRTNET_DRIVER_VERSION "1.0.0" struct virtnet_stats { - struct u64_stats_sync syncp; + struct u64_stats_sync tx_syncp; + struct u64_stats_sync rx_syncp; u64 tx_bytes; u64 tx_packets; @@ -300,10 +301,10 @@ static void receive_buf(struct net_device *dev, void *buf, unsigned int len) hdr = skb_vnet_hdr(skb); - u64_stats_update_begin(&stats->syncp); + u64_stats_update_begin(&stats->rx_syncp); stats->rx_bytes += skb->len; stats->rx_packets++; - u64_stats_update_end(&stats->syncp); + u64_stats_update_end(&stats->rx_syncp); if (hdr->hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) { pr_debug("Needs csum!\n"); @@ -565,10 +566,10 @@ static unsigned int free_old_xmit_skbs(struct virtnet_info *vi) while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) { pr_debug("Sent skb %p\n", skb); - u64_stats_update_begin(&stats->syncp); + u64_stats_update_begin(&stats->tx_syncp); stats->tx_bytes += skb->len; stats->tx_packets++; - u64_stats_update_end(&stats->syncp); + u64_stats_update_end(&stats->tx_syncp); tot_sgs += skb_vnet_hdr(skb)->num_sg; dev_kfree_skb_any(skb); @@ -703,12 +704,16 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev, u64 tpackets, tbytes, rpackets, rbytes; do { - start = u64_stats_fetch_begin(&stats->syncp); + start = u64_stats_fetch_begin(&stats->tx_syncp); tpackets = stats->tx_packets; tbytes = stats->tx_bytes; + } while (u64_stats_fetch_retry(&stats->tx_syncp, start)); + + do { + start = u64_stats_fetch_begin(&stats->rx_syncp); rpackets = stats->rx_packets; rbytes = stats->rx_bytes; - } while (u64_stats_fetch_retry(&stats->syncp, start)); + } while (u64_stats_fetch_retry(&stats->rx_syncp, start)); tot->rx_packets += rpackets; tot->tx_packets += tpackets;