Message ID | 1354892334.29937.14.camel@joe-AO722 |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
2012/12/7 Joe Perches <joe@perches.com>: > Given that the struct will always have limit at the start of > a cacheline, why not make struct ___cacheline_aligned_on_smp > and make limit the first member? > > It could make other structs that use struct dql a bit more > predictable or efficient to pack. > > (netdev_queue is size reduced from 256 to 192 on x86-32) > No, please. Have you tested this on a range of hardware and check how it can hurt performance ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-12-07 at 07:55 -0800, Eric Dumazet wrote: > 2012/12/7 Joe Perches <joe@perches.com>: > > Given that the struct will always have limit at the start of > > a cacheline, why not make struct ___cacheline_aligned_on_smp > > and make limit the first member? > > > > It could make other structs that use struct dql a bit more > > predictable or efficient to pack. > > > > (netdev_queue is size reduced from 256 to 192 on x86-32) > > > > No, please. > > Have you tested this on a range of hardware and check how it can hurt > performance ? No. Hence the RFC subject title and unsigned patch. I was wondering though about cacheline ping-pong effects. I noted Tom's comment back in http://patchwork.ozlabs.org/patch/108856/ "Also, the cache line containing the struct dql can ping-pong between CPUs doing initiation and completion. (I know we're aiming for these to be the same, but we can't yet assume they will be.)" So it seemed somewhat sensible to make the entire struct in a single cacheline. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-12-07 at 08:05 -0800, Joe Perches wrote: > So it seemed somewhat sensible to make the > entire struct in a single cacheline. Any layout change in an object used in network fast path need a complete performance study. Even if you provide such a study, we'll need to reproduce your numbers here. BQL/DQL is not on our radars, spending two cache lines on a critical object is fine. Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-12-07 at 08:19 -0800, Eric Dumazet wrote: > On Fri, 2012-12-07 at 08:05 -0800, Joe Perches wrote: > > > So it seemed somewhat sensible to make the > > entire struct in a single cacheline. > > Any layout change in an object used in network fast path need a complete > performance study. > > Even if you provide such a study, we'll need to reproduce your numbers > here. > > BQL/DQL is not on our radars, spending two cache lines on a critical > object is fine. Well Maybe Tom can provide some information as to why the limit variable was cacheline_aligned_in_smp and not the struct. I didn't find any discussion about it. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-12-07 at 08:29 -0800, Joe Perches wrote: > On Fri, 2012-12-07 at 08:19 -0800, Eric Dumazet wrote: > > On Fri, 2012-12-07 at 08:05 -0800, Joe Perches wrote: > > > > > So it seemed somewhat sensible to make the > > > entire struct in a single cacheline. > > > > Any layout change in an object used in network fast path need a complete > > performance study. > > > > Even if you provide such a study, we'll need to reproduce your numbers > > here. > > > > BQL/DQL is not on our radars, spending two cache lines on a critical > > object is fine. > > Well Maybe Tom can provide some information as to why > the limit variable was cacheline_aligned_in_smp and not > the struct. > > I didn't find any discussion about it. Structure alignment has to be at least the maximum of each member's alignment, so the struct *is* effectively cacheline_aligned_in_smp. Ben.
On Fri, 2012-12-07 at 08:29 -0800, Joe Perches wrote: > Well Maybe Tom can provide some information as to why > the limit variable was cacheline_aligned_in_smp and not > the struct. > > I didn't find any discussion about it. > The struct _is_ cache line aligned, since it contains one field needing cache line alignment. Its pretty clear to us. There are two cache lines in it. We don't one a single cache line, but two, for performance reasons. If you believe its wrong, you have to provide the performance study, not me, as I don't have time to spend cycles on this. We did this a long time ago. If you want to save few bytes on your kernel, redefine __cacheline_aligned_on_smp to empty, and you'll be ok. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h index 5621547..2d20b22 100644 --- a/include/linux/dynamic_queue_limits.h +++ b/include/linux/dynamic_queue_limits.h @@ -38,6 +38,8 @@ #ifdef __KERNEL__ struct dql { + unsigned int limit; /* Must be first field - Current limit */ + /* Fields accessed in enqueue path (dql_queued) */ unsigned int num_queued; /* Total ever queued */ unsigned int adj_limit; /* limit + num_completed */ @@ -45,7 +47,6 @@ struct dql { /* Fields accessed only by completion path (dql_completed) */ - unsigned int limit ____cacheline_aligned_in_smp; /* Current limit */ unsigned int num_completed; /* Total ever completed */ unsigned int prev_ovlimit; /* Previous over limit */ @@ -59,7 +60,7 @@ struct dql { unsigned int max_limit; /* Max limit */ unsigned int min_limit; /* Minimum limit */ unsigned int slack_hold_time; /* Time to measure slack */ -}; +} ____cacheline_aligned_in_smp; /* Set some static maximums */ #define DQL_MAX_OBJECT (UINT_MAX / 16)