Message ID | 490AE1CD.9040207@cosmosbay.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, 31 Oct 2008, Eric Dumazet wrote: > David Miller a écrit : > > From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> > > Date: Fri, 31 Oct 2008 11:40:16 +0200 (EET) > > > > > Let me remind that it is just a single process, so no ping-pong & other > > > lock related cache effects should play any significant role here, no? (I'm > > > no expert though :-)). > > > > Not locks or ping-pongs perhaps, I guess. So it just sends and > > receives over a socket, implementing both ends of the communication > > in the same process? > > > > If hash chain conflicts do happen for those 2 sockets, just traversing > > the chain 2 entries deep could show up. > > tbench is very sensible to cache line ping-pongs (on SMP machines of course) ...Sorry to disappoint you but we were discussion there on my AIM9 tcp_test results :-).
Ilpo Järvinen a écrit : > On Fri, 31 Oct 2008, Eric Dumazet wrote: > >> David Miller a écrit : >> > From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> >> > Date: Fri, 31 Oct 2008 11:40:16 +0200 (EET) >> > > > Let me remind that it is just a single process, so no ping-pong >> & other >> > > lock related cache effects should play any significant role here, >> no? (I'm >> > > no expert though :-)). >> > > Not locks or ping-pongs perhaps, I guess. So it just sends and >> > receives over a socket, implementing both ends of the communication >> > in the same process? >> > > If hash chain conflicts do happen for those 2 sockets, just >> traversing >> > the chain 2 entries deep could show up. >> >> tbench is very sensible to cache line ping-pongs (on SMP machines of >> course) > > ...Sorry to disappoint you but we were discussion there on my AIM9 > tcp_test results :-). > Well, before you added AIM9 on this topic, we were focusing on tbench :) Sorry to disappoint you :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 31 Oct 2008, Eric Dumazet wrote: > Ilpo Järvinen a écrit : > > On Fri, 31 Oct 2008, Eric Dumazet wrote: > > > > > David Miller a écrit : > > > > From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> > > > > Date: Fri, 31 Oct 2008 11:40:16 +0200 (EET) > > > > > > Let me remind that it is just a single process, so no ping-pong > >> & other > > > > > lock related cache effects should play any significant role here, > > > no? (I'm > > > > > no expert though :-)). > > > > > Not locks or ping-pongs perhaps, I guess. So it just sends and > > > > receives over a socket, implementing both ends of the communication > > > > in the same process? > > > > > If hash chain conflicts do happen for those 2 sockets, just > > > traversing > > > > the chain 2 entries deep could show up. > > > > > > tbench is very sensible to cache line ping-pongs (on SMP machines of > > > course) > > > > ...Sorry to disappoint you but we were discussion there on my AIM9 tcp_test > > results :-). > > > > Well, before you added AIM9 on this topic, we were focusing on tbench :) > > Sorry to disappoint you :) It's all Stephen's fault, he added port randomization first... ;-)
On Fri, 31 Oct 2008 11:45:33 +0100 Eric Dumazet <dada1@cosmosbay.com> wrote: > David Miller a écrit : > > From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> > > Date: Fri, 31 Oct 2008 11:40:16 +0200 (EET) > > > >> Let me remind that it is just a single process, so no ping-pong & other > >> lock related cache effects should play any significant role here, no? (I'm > >> no expert though :-)). > > > > Not locks or ping-pongs perhaps, I guess. So it just sends and > > receives over a socket, implementing both ends of the communication > > in the same process? > > > > If hash chain conflicts do happen for those 2 sockets, just traversing > > the chain 2 entries deep could show up. > > tbench is very sensible to cache line ping-pongs (on SMP machines of course) > > Just to prove my point, I coded the following patch and tried it > on a HP BL460c G1. This machine has 2 quad cores cpu > (Intel(R) Xeon(R) CPU E5450 @3.00GHz) > > tbench 8 went from 2240 MB/s to 2310 MB/s after this patch applied > > [PATCH] net: Introduce netif_set_last_rx() helper > > On SMP machine, loopback device (and possibly others net device) > should try to avoid dirty the memory cache line containing "last_rx" > field. Got 3% increase on tbench on a 8 cpus machine. > > Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> > --- > drivers/net/loopback.c | 2 +- > include/linux/netdevice.h | 16 ++++++++++++++++ > 2 files changed, 17 insertions(+), 1 deletion(-) > > Why bother with last_rx at all on loopback. I have been thinking we should figure out a way to get rid of last_rx all together. It only seems to be used by bonding, and the bonding driver could do the calculation in its receive handling. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 31, 2008 at 12:57:13PM -0700, Stephen Hemminger (shemminger@vyatta.com) wrote: > Why bother with last_rx at all on loopback. I have been thinking > we should figure out a way to get rid of last_rx all together. It only > seems to be used by bonding, and the bonding driver could do the calculation > in its receive handling. Not related to the regression: bug will be just papered out by this changes. Having bonding on loopback is somewhat strange idea, but still this kind of changes is an attempt to make a good play in the bad game: this loopback-only optimization does not fix the problem.
Evgeniy Polyakov a écrit : > On Fri, Oct 31, 2008 at 12:57:13PM -0700, Stephen Hemminger (shemminger@vyatta.com) wrote: >> Why bother with last_rx at all on loopback. I have been thinking >> we should figure out a way to get rid of last_rx all together. It only >> seems to be used by bonding, and the bonding driver could do the calculation >> in its receive handling. > > Not related to the regression: bug will be just papered out by this > changes. Having bonding on loopback is somewhat strange idea, but still > this kind of changes is an attempt to make a good play in the bad game: > this loopback-only optimization does not fix the problem. > Just to be clear, this change was not meant to be committed. It already was rejected by David some years ago (2005, and 2006) http://www.mail-archive.com/netdev@vger.kernel.org/msg07382.html If you read my mail, I was *only* saying that tbench results can be sensible to cache line ping pongs. tbench is a crazy benchmark, and only is a crazy benchmark. Optimizing linux for tbench sake would be .... crazy ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Eric. On Fri, Oct 31, 2008 at 10:03:00PM +0100, Eric Dumazet (dada1@cosmosbay.com) wrote: > Just to be clear, this change was not meant to be committed. > It already was rejected by David some years ago (2005, and 2006) > > http://www.mail-archive.com/netdev@vger.kernel.org/msg07382.html > > If you read my mail, I was *only* saying that tbench results can be > sensible to > cache line ping pongs. tbench is a crazy benchmark, and only is a crazy > benchmark. No problem Eric, I just pointed that this particular case is rather fluffy, which really does not fix anything. It improves the case, but the way it does it, is not the right one imho. We would definitely want to eliminate assignment of global constantly updated variables in the pathes where it is not required, but in a way which does improve the design and implementation, but not to hide some other problem. Tbench is, well, as is it is quite usual network server :) Dbench side is rather non-optimized, but still it is quite common pattern of small-sized IO. Anyway, optimizing for some kind of the workload tends to force other side to become slower, so I agree of course that any narrow-viewed optimizations are bad, and instead we should focus on searching error patter more widerspread.
From: Eric Dumazet <dada1@cosmosbay.com> Date: Fri, 31 Oct 2008 22:03:00 +0100 > Evgeniy Polyakov a écrit : > > On Fri, Oct 31, 2008 at 12:57:13PM -0700, Stephen Hemminger (shemminger@vyatta.com) wrote: > >> Why bother with last_rx at all on loopback. I have been thinking > >> we should figure out a way to get rid of last_rx all together. It only > >> seems to be used by bonding, and the bonding driver could do the calculation > >> in its receive handling. > > Not related to the regression: bug will be just papered out by this > > changes. Having bonding on loopback is somewhat strange idea, but still > > this kind of changes is an attempt to make a good play in the bad game: > > this loopback-only optimization does not fix the problem. > > Just to be clear, this change was not meant to be committed. > It already was rejected by David some years ago (2005, and 2006) > > http://www.mail-archive.com/netdev@vger.kernel.org/msg07382.html However, I do like Stephen's suggestion that maybe we can get rid of this ->last_rx thing by encapsulating the logic completely in the bonding driver. > If you read my mail, I was *only* saying that tbench results can be sensible to > cache line ping pongs. tbench is a crazy benchmark, and only is a crazy benchmark. > > Optimizing linux for tbench sake would be .... crazy ? Unlike dbench I think tbench is worth cranking up as much as possible. It doesn't have a huge memory working set, it just writes mostly small messages over a TCP socket back and forth, and does a lot of blocking And I think we'd like all of those operating to run as fast as possible. When Tridge first wrote tbench I would see the expected things at the top of the profiles. Things like tcp_ack(), copy to/from user, and perhaps SLAB. Things have changed considerably. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 31 Oct 2008 16:51:44 -0700 (PDT) David Miller <davem@davemloft.net> wrote: > From: Eric Dumazet <dada1@cosmosbay.com> > Date: Fri, 31 Oct 2008 22:03:00 +0100 > > > Evgeniy Polyakov a écrit : > > > On Fri, Oct 31, 2008 at 12:57:13PM -0700, Stephen Hemminger (shemminger@vyatta.com) wrote: > > >> Why bother with last_rx at all on loopback. I have been thinking > > >> we should figure out a way to get rid of last_rx all together. It only > > >> seems to be used by bonding, and the bonding driver could do the calculation > > >> in its receive handling. > > > Not related to the regression: bug will be just papered out by this > > > changes. Having bonding on loopback is somewhat strange idea, but still > > > this kind of changes is an attempt to make a good play in the bad game: > > > this loopback-only optimization does not fix the problem. > > > > Just to be clear, this change was not meant to be committed. > > It already was rejected by David some years ago (2005, and 2006) > > > > http://www.mail-archive.com/netdev@vger.kernel.org/msg07382.html > > However, I do like Stephen's suggestion that maybe we can get rid of > this ->last_rx thing by encapsulating the logic completely in the > bonding driver. Since bonding driver doesn't actually see the rx packets, that isn't really possible. But it would be possible to change last_rx from a variable to an function pointer, so that device's could apply other logic to derive the last value. One example would be to keep it per cpu and then take the maximum. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c index 3b43bfd..cf17238 100644 --- a/drivers/net/loopback.c +++ b/drivers/net/loopback.c @@ -85,7 +85,7 @@ static int loopback_xmit(struct sk_buff *skb, struct net_device *dev) return 0; } #endif - dev->last_rx = jiffies; + netif_set_last_rx(dev); /* it's OK to use per_cpu_ptr() because BHs are off */ pcpu_lstats = dev->ml_priv; diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index c8bcb59..6729865 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -849,6 +849,22 @@ static inline void *netdev_priv(const struct net_device *dev) #define SET_NETDEV_DEV(net, pdev) ((net)->dev.parent = (pdev)) /** + * netif_set_last_rx - Set last_rx field of a device + * @dev: network device + * + * Instead of setting net->last_rx to jiffies, drivers should call this helper + * to avoid dirtying a cache line if last_rx already has the current jiffies + */ +static inline void netif_set_last_rx(struct net_device *dev) +{ +#ifdef CONFIG_SMP + if (dev->last_rx == jiffies) + return; +#endif + dev->last_rx = jiffies; +} + +/** * netif_napi_add - initialize a napi context * @dev: network device * @napi: napi context