diff mbox

[RFC] dynamic_queue_limit.h: Make the struct ___cacheline_aligned_on_smp

Message ID 1354892334.29937.14.camel@joe-AO722
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Joe Perches Dec. 7, 2012, 2:58 p.m. UTC
Given that the struct will always have limit at the start of
a cacheline, why not make  struct ___cacheline_aligned_on_smp
and make limit the first member?

It could make other structs that use struct dql a bit more
predictable or efficient to pack.

(netdev_queue is size reduced from 256 to 192 on x86-32)

---

 include/linux/dynamic_queue_limits.h |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet Dec. 7, 2012, 3:55 p.m. UTC | #1
2012/12/7 Joe Perches <joe@perches.com>:
> Given that the struct will always have limit at the start of
> a cacheline, why not make  struct ___cacheline_aligned_on_smp
> and make limit the first member?
>
> It could make other structs that use struct dql a bit more
> predictable or efficient to pack.
>
> (netdev_queue is size reduced from 256 to 192 on x86-32)
>

No, please.

Have you tested this on a range of hardware and check how it can hurt
performance ?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joe Perches Dec. 7, 2012, 4:05 p.m. UTC | #2
On Fri, 2012-12-07 at 07:55 -0800, Eric Dumazet wrote:
> 2012/12/7 Joe Perches <joe@perches.com>:
> > Given that the struct will always have limit at the start of
> > a cacheline, why not make  struct ___cacheline_aligned_on_smp
> > and make limit the first member?
> >
> > It could make other structs that use struct dql a bit more
> > predictable or efficient to pack.
> >
> > (netdev_queue is size reduced from 256 to 192 on x86-32)
> >
> 
> No, please.
> 
> Have you tested this on a range of hardware and check how it can hurt
> performance ?

No.  Hence the RFC subject title and
unsigned patch.

I was wondering though about cacheline ping-pong 
effects.

I noted Tom's comment back in
http://patchwork.ozlabs.org/patch/108856/
"Also, the cache line containing the struct dql can ping-pong between
CPUs doing initiation and completion.  (I know we're aiming for these to
be the same, but we can't yet assume they will be.)"

So it seemed somewhat sensible to make the
entire struct in a single cacheline.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 7, 2012, 4:19 p.m. UTC | #3
On Fri, 2012-12-07 at 08:05 -0800, Joe Perches wrote:

> So it seemed somewhat sensible to make the
> entire struct in a single cacheline.

Any layout change in an object used in network fast path need a complete
performance study.

Even if you provide such a study, we'll need to reproduce your numbers
here.

BQL/DQL is not on our radars, spending two cache lines on a critical
object is fine.

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joe Perches Dec. 7, 2012, 4:29 p.m. UTC | #4
On Fri, 2012-12-07 at 08:19 -0800, Eric Dumazet wrote:
> On Fri, 2012-12-07 at 08:05 -0800, Joe Perches wrote:
> 
> > So it seemed somewhat sensible to make the
> > entire struct in a single cacheline.
> 
> Any layout change in an object used in network fast path need a complete
> performance study.
> 
> Even if you provide such a study, we'll need to reproduce your numbers
> here.
> 
> BQL/DQL is not on our radars, spending two cache lines on a critical
> object is fine.

Well Maybe Tom can provide some information as to why
the limit variable was cacheline_aligned_in_smp and not
the struct.

I didn't find any discussion about it.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings Dec. 7, 2012, 4:42 p.m. UTC | #5
On Fri, 2012-12-07 at 08:29 -0800, Joe Perches wrote:
> On Fri, 2012-12-07 at 08:19 -0800, Eric Dumazet wrote:
> > On Fri, 2012-12-07 at 08:05 -0800, Joe Perches wrote:
> > 
> > > So it seemed somewhat sensible to make the
> > > entire struct in a single cacheline.
> > 
> > Any layout change in an object used in network fast path need a complete
> > performance study.
> > 
> > Even if you provide such a study, we'll need to reproduce your numbers
> > here.
> > 
> > BQL/DQL is not on our radars, spending two cache lines on a critical
> > object is fine.
> 
> Well Maybe Tom can provide some information as to why
> the limit variable was cacheline_aligned_in_smp and not
> the struct.
> 
> I didn't find any discussion about it.

Structure alignment has to be at least the maximum of each member's
alignment, so the struct *is* effectively cacheline_aligned_in_smp.

Ben.
Eric Dumazet Dec. 7, 2012, 4:54 p.m. UTC | #6
On Fri, 2012-12-07 at 08:29 -0800, Joe Perches wrote:

> Well Maybe Tom can provide some information as to why
> the limit variable was cacheline_aligned_in_smp and not
> the struct.
> 
> I didn't find any discussion about it.
> 

The struct _is_ cache line aligned, since it contains one field needing
cache line alignment. Its pretty clear to us.

There are two cache lines in it.

We don't one a single cache line, but two, for performance reasons.

If you believe its wrong, you have to provide the performance study, not
me, as I don't have time to spend cycles on this. We did this a long
time ago.

If you want to save few bytes on your kernel, redefine
__cacheline_aligned_on_smp to empty, and you'll be ok.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
index 5621547..2d20b22 100644
--- a/include/linux/dynamic_queue_limits.h
+++ b/include/linux/dynamic_queue_limits.h
@@ -38,6 +38,8 @@ 
 #ifdef __KERNEL__
 
 struct dql {
+	unsigned int	limit; /* Must be first field - Current limit */
+
 	/* Fields accessed in enqueue path (dql_queued) */
 	unsigned int	num_queued;		/* Total ever queued */
 	unsigned int	adj_limit;		/* limit + num_completed */
@@ -45,7 +47,6 @@  struct dql {
 
 	/* Fields accessed only by completion path (dql_completed) */
 
-	unsigned int	limit ____cacheline_aligned_in_smp; /* Current limit */
 	unsigned int	num_completed;		/* Total ever completed */
 
 	unsigned int	prev_ovlimit;		/* Previous over limit */
@@ -59,7 +60,7 @@  struct dql {
 	unsigned int	max_limit;		/* Max limit */
 	unsigned int	min_limit;		/* Minimum limit */
 	unsigned int	slack_hold_time;	/* Time to measure slack */
-};
+} ____cacheline_aligned_in_smp;
 
 /* Set some static maximums */
 #define DQL_MAX_OBJECT (UINT_MAX / 16)