Message ID | 1338988210.2760.4485.camel@edumazet-glaptop |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, Jun 06, 2012 at 03:10:10PM +0200, Eric Dumazet wrote: > On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote: > > > We currently do all stats either on napi callback or from > > start_xmit callback. > > This makes them safe, yes? > > Hmm, then _bh() variant is needed in virtnet_stats(), as explained in > include/linux/u64_stats_sync.h section 6) > > * 6) If counter might be written by an interrupt, readers should block interrupts. > * (On UP, there is no seqcount_t protection, a reader allowing interrupts could > * read partial values) > > Yes, its tricky... Sounds good, but I have a question: this realies on counters being atomic on 64 bit. Would not it be better to always use a seqlock even on 64 bit? This way counters would actually be correct and in sync. As it is if we want e.g. average packet size, we can not rely e.g. on it being bytes/packets. > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 5214b1e..705aaa7 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -703,12 +703,12 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev, > u64 tpackets, tbytes, rpackets, rbytes; > > do { > - start = u64_stats_fetch_begin(&stats->syncp); > + start = u64_stats_fetch_begin_bh(&stats->syncp); > tpackets = stats->tx_packets; > tbytes = stats->tx_bytes; > rpackets = stats->rx_packets; > rbytes = stats->rx_bytes; > - } while (u64_stats_fetch_retry(&stats->syncp, start)); > + } while (u64_stats_fetch_retry_bh(&stats->syncp, start)); > > tot->rx_packets += rpackets; > tot->tx_packets += tpackets; > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 6 Jun 2012 17:49:42 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote: > Sounds good, but I have a question: this realies on counters > being atomic on 64 bit. > Would not it be better to always use a seqlock even on 64 bit? > This way counters would actually be correct and in sync. > As it is if we want e.g. average packet size, > we can not rely e.g. on it being bytes/packets. This has not been a requirement on real physical devices; therefore the added overhead is not really justified. Many network cards use counters in hardware to count packets/bytes and there is no expectation of atomic access there. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2012-06-06 at 17:49 +0300, Michael S. Tsirkin wrote: > On Wed, Jun 06, 2012 at 03:10:10PM +0200, Eric Dumazet wrote: > > On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote: > > > > > We currently do all stats either on napi callback or from > > > start_xmit callback. > > > This makes them safe, yes? > > > > Hmm, then _bh() variant is needed in virtnet_stats(), as explained in > > include/linux/u64_stats_sync.h section 6) > > > > * 6) If counter might be written by an interrupt, readers should block interrupts. > > * (On UP, there is no seqcount_t protection, a reader allowing interrupts could > > * read partial values) > > > > Yes, its tricky... > > Sounds good, but I have a question: this realies on counters > being atomic on 64 bit. > Would not it be better to always use a seqlock even on 64 bit? > This way counters would actually be correct and in sync. > As it is if we want e.g. average packet size, > we can not rely e.g. on it being bytes/packets. When this stuff was discussed, we chose to have a nop on 64bits. Your point has little to do with 64bit stats, it was already like that with 'long int' counters. Consider average driver doing : dev->stats.rx_bytes += skb->len; dev->stats.rx_packets++; A concurrent reader can read an updated rx_bytes and a 'previous' rx_packets one. 'fixing' this requires a lot of work and memory barriers (in all drivers), for a very litle gain (at most one packet error) u64_stats_sync was really meant to be 0-cost on 64bit arches. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 06, 2012 at 05:19:04PM +0200, Eric Dumazet wrote: > On Wed, 2012-06-06 at 17:49 +0300, Michael S. Tsirkin wrote: > > On Wed, Jun 06, 2012 at 03:10:10PM +0200, Eric Dumazet wrote: > > > On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote: > > > > > > > We currently do all stats either on napi callback or from > > > > start_xmit callback. > > > > This makes them safe, yes? > > > > > > Hmm, then _bh() variant is needed in virtnet_stats(), as explained in > > > include/linux/u64_stats_sync.h section 6) > > > > > > * 6) If counter might be written by an interrupt, readers should block interrupts. > > > * (On UP, there is no seqcount_t protection, a reader allowing interrupts could > > > * read partial values) > > > > > > Yes, its tricky... > > > > Sounds good, but I have a question: this realies on counters > > being atomic on 64 bit. > > Would not it be better to always use a seqlock even on 64 bit? > > This way counters would actually be correct and in sync. > > As it is if we want e.g. average packet size, > > we can not rely e.g. on it being bytes/packets. > > When this stuff was discussed, we chose to have a nop on 64bits. > > Your point has little to do with 64bit stats, it was already like that > with 'long int' counters. Yes, of course. > Consider average driver doing : > > dev->stats.rx_bytes += skb->len; > dev->stats.rx_packets++; > > A concurrent reader can read an updated rx_bytes and a 'previous' > rx_packets one. > > 'fixing' this requires a lot of work and memory barriers (in all > drivers), for a very litle gain (at most one packet error) > u64_stats_sync was really meant to be 0-cost on 64bit arches. > > I understand, and not arguing about that. But why do you say at most 1 packet? Consider get_stats doing: u64_stats_update_begin(&stats->syncp); stats->tx_bytes += skb->len; on 64 bit at this point tx_packets might get incremented any number of times, no? stats->tx_packets++; u64_stats_update_end(&stats->syncp); now tx_bytes and tx_packets are out of sync by more than 1. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 06, 2012 at 05:19:04PM +0200, Eric Dumazet wrote: > On Wed, 2012-06-06 at 17:49 +0300, Michael S. Tsirkin wrote: > > On Wed, Jun 06, 2012 at 03:10:10PM +0200, Eric Dumazet wrote: > > > On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote: > > > > > > > We currently do all stats either on napi callback or from > > > > start_xmit callback. > > > > This makes them safe, yes? > > > > > > Hmm, then _bh() variant is needed in virtnet_stats(), as explained in > > > include/linux/u64_stats_sync.h section 6) > > > > > > * 6) If counter might be written by an interrupt, readers should block interrupts. > > > * (On UP, there is no seqcount_t protection, a reader allowing interrupts could > > > * read partial values) > > > > > > Yes, its tricky... > > > > Sounds good, but I have a question: this realies on counters > > being atomic on 64 bit. > > Would not it be better to always use a seqlock even on 64 bit? > > This way counters would actually be correct and in sync. > > As it is if we want e.g. average packet size, > > we can not rely e.g. on it being bytes/packets. > > When this stuff was discussed, we chose to have a nop on 64bits. > > Your point has little to do with 64bit stats, it was already like that > with 'long int' counters. > > Consider average driver doing : > > dev->stats.rx_bytes += skb->len; > dev->stats.rx_packets++; > > A concurrent reader can read an updated rx_bytes and a 'previous' > rx_packets one. > > 'fixing' this requires a lot of work and memory barriers (in all > drivers), for a very litle gain (at most one packet error) > > u64_stats_sync was really meant to be 0-cost on 64bit arches. > So for virtio since all counters get incremented from bh we can ensure they are read atomically, simply but reading them from the correct CPU with bh disabled. And then we don't need u64_stats_sync at all.
On Wed, 2012-06-06 at 19:17 +0300, Michael S. Tsirkin wrote: > But why do you say at most 1 packet? > > Consider get_stats doing: > u64_stats_update_begin(&stats->syncp); > stats->tx_bytes += skb->len; > > on 64 bit at this point > tx_packets might get incremented any number of times, no? > > stats->tx_packets++; > u64_stats_update_end(&stats->syncp); > > now tx_bytes and tx_packets are out of sync by more than 1. You lost me there. No idea of what you are thinking about. There is no atomicity guarantee in SNMP counters. (Ie fetching tx_bytes and tx_packets in a transaction is not mandatory in any RFC) As long as there is no cumulative error, its OK. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 06, 2012 at 07:13:02PM +0200, Eric Dumazet wrote: > On Wed, 2012-06-06 at 19:17 +0300, Michael S. Tsirkin wrote: > > > But why do you say at most 1 packet? > > > > Consider get_stats doing: > > u64_stats_update_begin(&stats->syncp); > > stats->tx_bytes += skb->len; > > > > on 64 bit at this point > > tx_packets might get incremented any number of times, no? > > > > stats->tx_packets++; > > u64_stats_update_end(&stats->syncp); > > > > now tx_bytes and tx_packets are out of sync by more than 1. > > You lost me there. > > No idea of what you are thinking about. Sorry about that. This is not a bug. I am saying two things: 1. We are trying to look at counters for purposes of tuning the device. E.g. if ethtool reports packets and bytes, we'd like to calculate average packet size by bytes/packets. If both counters are read atomically the metric becomes more exact. Not a must but nice to have. 2. 32 bit systems have some overhead because of the seqlock. virtio could instead simply keep tx counters in the queue structure, and get the tx lock when they are read.
On Wed, Jun 06, 2012 at 08:14:32AM -0700, Stephen Hemminger wrote: > On Wed, 6 Jun 2012 17:49:42 +0300 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > Sounds good, but I have a question: this realies on counters > > being atomic on 64 bit. > > Would not it be better to always use a seqlock even on 64 bit? > > This way counters would actually be correct and in sync. > > As it is if we want e.g. average packet size, > > we can not rely e.g. on it being bytes/packets. > > This has not been a requirement on real physical devices; therefore > the added overhead is not really justified. > > Many network cards use counters in hardware to count packets/bytes > and there is no expectation of atomic access there. BTW for cards that do implement the counters in software, under xmit lock, is anything wrong with simply taking the xmit lock when we get the stats instead of the per-cpu trick + seqlock? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2012-06-06 at 21:51 +0300, Michael S. Tsirkin wrote: > BTW for cards that do implement the counters in software, > under xmit lock, is anything wrong with simply taking the xmit lock > when we get the stats instead of the per-cpu trick + seqlock? > I still dont understand why you would do that. Most modern machines are 64bits, so there is no seqlock overhead, nothing at all. If you focus on 32bit hardware, just stick on 32bit counters ? Note that most u64_stats_sync users are virtual drivers, without xmit lock (LLTX drivers) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 06, 2012 at 09:54:01PM +0200, Eric Dumazet wrote: > On Wed, 2012-06-06 at 21:51 +0300, Michael S. Tsirkin wrote: > > > BTW for cards that do implement the counters in software, > > under xmit lock, is anything wrong with simply taking the xmit lock > > when we get the stats instead of the per-cpu trick + seqlock? > > > > I still dont understand why you would do that. > > Most modern machines are 64bits, so there is no seqlock overhead, > nothing at all. > > If you focus on 32bit hardware, just stick on 32bit counters ? These wrap around. > Note that most u64_stats_sync users are virtual drivers, without xmit > lock (LLTX drivers) > > Absolutely, I am talking about virtio here. I'm not kicking u64_stats_sync idea I am just saying that simple locking would work for virtio and might be better as it gives us a way to get counters atomically.
On Wed, 2012-06-06 at 19:57 +0300, Michael S. Tsirkin wrote: > So for virtio since all counters get incremented from bh we can > ensure they are read atomically, simply but reading them > from the correct CPU with bh disabled. > And then we don't need u64_stats_sync at all. > Really ? How are you going to read 64bit stats from foreign cpus on 32bit arches, without additional cost in fast path ? You should read include/linux/u64_stats_sync.h to fully understand the issues. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2012-06-06 at 21:43 +0300, Michael S. Tsirkin wrote: > 1. We are trying to look at counters for purposes of tuning the device. > E.g. if ethtool reports packets and bytes, we'd like to calculate > average packet size by bytes/packets. > > If both counters are read atomically the metric becomes more exact. > Not a must but nice to have. > metrics are exact right now. As soon as you read a value, it might already have changed. Maybe you want to stop_machine() to make sure all the metrics you want are 'exact' ;) > 2. 32 bit systems have some overhead because of the seqlock. > virtio could instead simply keep tx counters in the queue structure, and > get the tx lock when they are read. > But then you need atomic64 stuff, have you an idea of the cost of such primitives on 32bit ? 3. use 32bit counters on 32bit arches, as many drivers still do ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2012-06-06 at 22:58 +0300, Michael S. Tsirkin wrote: > Absolutely, I am talking about virtio here. I'm not kicking > u64_stats_sync idea I am just saying that simple locking > would work for virtio and might be better as it > gives us a way to get counters atomically. Which lock do you own in the RX path ? You'll have to add a lock in fast path. This sounds really a bad choice to me. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 06, 2012 at 10:08:09PM +0200, Eric Dumazet wrote: > On Wed, 2012-06-06 at 22:58 +0300, Michael S. Tsirkin wrote: > > > Absolutely, I am talking about virtio here. I'm not kicking > > u64_stats_sync idea I am just saying that simple locking > > would work for virtio and might be better as it > > gives us a way to get counters atomically. > > Which lock do you own in the RX path ? We can just disable napi, everything is updated from napi callback. > You'll have to add a lock in fast path. This sounds really a bad choice > to me. .ndo_get_stats64 is not data path though, is it?
On Wed, 2012-06-06 at 22:08 +0200, Eric Dumazet wrote: > On Wed, 2012-06-06 at 22:58 +0300, Michael S. Tsirkin wrote: > > > Absolutely, I am talking about virtio here. I'm not kicking > > u64_stats_sync idea I am just saying that simple locking > > would work for virtio and might be better as it > > gives us a way to get counters atomically. > > Which lock do you own in the RX path ? > > You'll have to add a lock in fast path. This sounds really a bad choice > to me. You have the NAPI 'lock', so when gathering stats you can synchronise using napi_disable() ;-) Ben.
On Wed, Jun 06, 2012 at 10:06:11PM +0200, Eric Dumazet wrote: > On Wed, 2012-06-06 at 21:43 +0300, Michael S. Tsirkin wrote: > > > 1. We are trying to look at counters for purposes of tuning the device. > > E.g. if ethtool reports packets and bytes, we'd like to calculate > > average packet size by bytes/packets. > > > > If both counters are read atomically the metric becomes more exact. > > Not a must but nice to have. > > > > metrics are exact right now. Yes, but they are not synchronised between themselves. E.g. you can in theory have a report where #of packets > #of bytes. I know there's no guarantee they are synchronised on an arbitrary device but if they are, without slowing fast path, it's nice. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2012-06-06 at 23:16 +0300, Michael S. Tsirkin wrote: > On Wed, Jun 06, 2012 at 10:08:09PM +0200, Eric Dumazet wrote: > > On Wed, 2012-06-06 at 22:58 +0300, Michael S. Tsirkin wrote: > > > > > Absolutely, I am talking about virtio here. I'm not kicking > > > u64_stats_sync idea I am just saying that simple locking > > > would work for virtio and might be better as it > > > gives us a way to get counters atomically. > > > > Which lock do you own in the RX path ? > > We can just disable napi, everything is updated from napi callback. This is very disruptive, and illegal from ndo_get_stats64() (ndo_get_stats64() is not allowed to sleep, and I cant see how you are going to disable napi without sleeping) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2012-06-06 at 21:19 +0100, Ben Hutchings wrote: > On Wed, 2012-06-06 at 22:08 +0200, Eric Dumazet wrote: > > On Wed, 2012-06-06 at 22:58 +0300, Michael S. Tsirkin wrote: > > > > > Absolutely, I am talking about virtio here. I'm not kicking > > > u64_stats_sync idea I am just saying that simple locking > > > would work for virtio and might be better as it > > > gives us a way to get counters atomically. > > > > Which lock do you own in the RX path ? > > > > You'll have to add a lock in fast path. This sounds really a bad choice > > to me. > > You have the NAPI 'lock', so when gathering stats you can synchronise > using napi_disable() ;-) Nice, this adds one new bug in network stack. Really guys, can we stop this thread, please ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2012-06-06 at 23:16 +0300, Michael S. Tsirkin wrote: > On Wed, Jun 06, 2012 at 10:08:09PM +0200, Eric Dumazet wrote: > > On Wed, 2012-06-06 at 22:58 +0300, Michael S. Tsirkin wrote: > > > > > Absolutely, I am talking about virtio here. I'm not kicking > > > u64_stats_sync idea I am just saying that simple locking > > > would work for virtio and might be better as it > > > gives us a way to get counters atomically. > > > > Which lock do you own in the RX path ? > > We can just disable napi, everything is updated from napi callback. Seriously, though: don't do that; this is going to hurt performance for minimal benefit. Ben. > > You'll have to add a lock in fast path. This sounds really a bad choice > > to me. > > .ndo_get_stats64 is not data path though, is it? >
On Wed, 2012-06-06 at 22:24 +0200, Eric Dumazet wrote: > (ndo_get_stats64() is not allowed to sleep, and I cant see how you are > going to disable napi without sleeping) > > In case you wonder, take a look at bond_get_stats() in drivers/net/bonding/bond_main.c -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 06, 2012 at 09:35:04PM +0100, Ben Hutchings wrote: > On Wed, 2012-06-06 at 23:16 +0300, Michael S. Tsirkin wrote: > > On Wed, Jun 06, 2012 at 10:08:09PM +0200, Eric Dumazet wrote: > > > On Wed, 2012-06-06 at 22:58 +0300, Michael S. Tsirkin wrote: > > > > > > > Absolutely, I am talking about virtio here. I'm not kicking > > > > u64_stats_sync idea I am just saying that simple locking > > > > would work for virtio and might be better as it > > > > gives us a way to get counters atomically. > > > > > > Which lock do you own in the RX path ? > > > > We can just disable napi, everything is updated from napi callback. > > Seriously, though: don't do that; this is going to hurt performance for > minimal benefit. > > Ben. Yea, it doesn't work anyway. Maybe take a xmit lock for tx and keep using the per-cpu counters for rx. Or does this sound too disruptive too? > > > You'll have to add a lock in fast path. This sounds really a bad choice > > > to me. > > > > .ndo_get_stats64 is not data path though, is it? > > > > -- > Ben Hutchings, Staff Engineer, Solarflare > Not speaking for my employer; that's the marketing department's job. > They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 5214b1e..705aaa7 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -703,12 +703,12 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev, u64 tpackets, tbytes, rpackets, rbytes; do { - start = u64_stats_fetch_begin(&stats->syncp); + start = u64_stats_fetch_begin_bh(&stats->syncp); tpackets = stats->tx_packets; tbytes = stats->tx_bytes; rpackets = stats->rx_packets; rbytes = stats->rx_bytes; - } while (u64_stats_fetch_retry(&stats->syncp, start)); + } while (u64_stats_fetch_retry_bh(&stats->syncp, start)); tot->rx_packets += rpackets; tot->tx_packets += tpackets;