diff mbox

net: force dst_default_metrics to const section

Message ID 1344372945.28967.165.camel@edumazet-glaptop
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Aug. 7, 2012, 8:55 p.m. UTC
From: Eric Dumazet <edumazet@google.com>


> Some day the compiler may be smart enough to ignore the different
> between explicit and implicit zero-initialisation, and put it back in
> BSS.  Declaring this __cache_aligned_in_smp might be a better option.

__cache_aligned_in_smp aligns start of the structure, but can be
followed by another var in same cache line. Yes, this is bad.

By the way we dont care of cache alignment on this structure, only it
should be const. Its a soft requirement, machine wont crash if it is not
the case.

If compiler is smart one day as you say (it should first be non buggy
IMHO), then we can add a non zero field like this :

Thanks

[PATCH v2] net: force dst_default_metrics to const section

While investigating on network performance problems, I found this little
gem :

$ nm -v vmlinux | grep -1 dst_default_metrics
ffffffff82736540 b busy.46605 
ffffffff82736560 B dst_default_metrics
ffffffff82736598 b dst_busy_list

Apparently, declaring a const array without initializer put it in
(writeable) bss section, in middle of possibly often dirtied cache
lines.

Since we really want dst_default_metrics be const to avoid any possible
false sharing and catch any buggy writes, I force a null initializer.

ffffffff818a4c20 R dst_default_metrics

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
---
 include/net/dst.h |    2 +-
 net/core/dst.c    |   10 +++++++++-
 2 files changed, 10 insertions(+), 2 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Ben Hutchings Aug. 7, 2012, 10:12 p.m. UTC | #1
On Tue, 2012-08-07 at 22:55 +0200, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> 
> > Some day the compiler may be smart enough to ignore the different
> > between explicit and implicit zero-initialisation, and put it back in
> > BSS.  Declaring this __cache_aligned_in_smp might be a better option.
> 
> __cache_aligned_in_smp aligns start of the structure, but can be
> followed by another var in same cache line. Yes, this is bad.

Oh, that's unexpected.

> By the way we dont care of cache alignment on this structure, only it
> should be const. Its a soft requirement, machine wont crash if it is not
> the case.

Right.

> If compiler is smart one day as you say (it should first be non buggy
> IMHO), then we can add a non zero field like this :
[...]

That would work, but it's ugly!  How about defining and using a
meaningfully-named macro that expands to __section(.rodata)?

Ben.
Eric Dumazet Aug. 7, 2012, 10:34 p.m. UTC | #2
On Tue, 2012-08-07 at 23:12 +0100, Ben Hutchings wrote:
> On Tue, 2012-08-07 at 22:55 +0200, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > 
> > > Some day the compiler may be smart enough to ignore the different
> > > between explicit and implicit zero-initialisation, and put it back in
> > > BSS.  Declaring this __cache_aligned_in_smp might be a better option.
> > 
> > __cache_aligned_in_smp aligns start of the structure, but can be
> > followed by another var in same cache line. Yes, this is bad.
> 
> Oh, that's unexpected.
> 
> > By the way we dont care of cache alignment on this structure, only it
> > should be const. Its a soft requirement, machine wont crash if it is not
> > the case.
> 
> Right.
> 
> > If compiler is smart one day as you say (it should first be non buggy
> > IMHO), then we can add a non zero field like this :
> [...]
> 
> That would work, but it's ugly!  How about defining and using a
> meaningfully-named macro that expands to __section(.rodata)?

You are kidding. I prefer plain C and not having to mess with all
arches.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings Aug. 7, 2012, 10:44 p.m. UTC | #3
On Wed, 2012-08-08 at 00:34 +0200, Eric Dumazet wrote:
> On Tue, 2012-08-07 at 23:12 +0100, Ben Hutchings wrote:
> > On Tue, 2012-08-07 at 22:55 +0200, Eric Dumazet wrote:
> > > From: Eric Dumazet <edumazet@google.com>
> > > 
> > > 
> > > > Some day the compiler may be smart enough to ignore the different
> > > > between explicit and implicit zero-initialisation, and put it back in
> > > > BSS.  Declaring this __cache_aligned_in_smp might be a better option.
> > > 
> > > __cache_aligned_in_smp aligns start of the structure, but can be
> > > followed by another var in same cache line. Yes, this is bad.
> > 
> > Oh, that's unexpected.
> > 
> > > By the way we dont care of cache alignment on this structure, only it
> > > should be const. Its a soft requirement, machine wont crash if it is not
> > > the case.
> > 
> > Right.
> > 
> > > If compiler is smart one day as you say (it should first be non buggy
> > > IMHO), then we can add a non zero field like this :
> > [...]
> > 
> > That would work, but it's ugly!  How about defining and using a
> > meaningfully-named macro that expands to __section(.rodata)?
> 
> You are kidding. I prefer plain C and not having to mess with all
> arches.

Any consideration of implementation details like BSS and cache line
sharing is already outside of 'plain C'.  And you don't have to 'mess
with all arches'; just look at what <linux/init.h> and <linux/module.h>
do.

Ben.
Eric Dumazet Aug. 7, 2012, 10:56 p.m. UTC | #4
On Tue, 2012-08-07 at 23:44 +0100, Ben Hutchings wrote:

> Any consideration of implementation details like BSS and cache line
> sharing is already outside of 'plain C'.  And you don't have to 'mess
> with all arches'; just look at what <linux/init.h> and <linux/module.h>
> do.

"const" is the clean and portable way to express my needs. No magic
section.

All is self-contained in the definition of the metrics, with a nice
comment.

All const are naturally shared by all cpus, without adding extra cache
line boundaries.

Please feel free to send another version, but I wont spend more time
myself.

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Aug. 7, 2012, 11:17 p.m. UTC | #5
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 08 Aug 2012 00:56:40 +0200

> On Tue, 2012-08-07 at 23:44 +0100, Ben Hutchings wrote:
> 
>> Any consideration of implementation details like BSS and cache line
>> sharing is already outside of 'plain C'.  And you don't have to 'mess
>> with all arches'; just look at what <linux/init.h> and <linux/module.h>
>> do.
> 
> "const" is the clean and portable way to express my needs. No magic
> section.
> 
> All is self-contained in the definition of the metrics, with a nice
> comment.
> 
> All const are naturally shared by all cpus, without adding extra cache
> line boundaries.

I agree with Eric and will apply his patch.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Aug. 8, 2012, 11 p.m. UTC | #6
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 07 Aug 2012 22:55:45 +0200

> [PATCH v2] net: force dst_default_metrics to const section
> 
> While investigating on network performance problems, I found this little
> gem :
> 
> $ nm -v vmlinux | grep -1 dst_default_metrics
> ffffffff82736540 b busy.46605 
> ffffffff82736560 B dst_default_metrics
> ffffffff82736598 b dst_busy_list
> 
> Apparently, declaring a const array without initializer put it in
> (writeable) bss section, in middle of possibly often dirtied cache
> lines.
> 
> Since we really want dst_default_metrics be const to avoid any possible
> false sharing and catch any buggy writes, I force a null initializer.
> 
> ffffffff818a4c20 R dst_default_metrics
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Ben Hutchings <bhutchings@solarflare.com>

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/dst.h b/include/net/dst.h
index baf5978..621e351 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -110,7 +110,7 @@  struct dst_entry {
 };
 
 extern u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old);
-extern const u32 dst_default_metrics[RTAX_MAX];
+extern const u32 dst_default_metrics[];
 
 #define DST_METRICS_READ_ONLY	0x1UL
 #define __DST_METRICS_PTR(Y)	\
diff --git a/net/core/dst.c b/net/core/dst.c
index 069d51d..56d6361 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -149,7 +149,15 @@  int dst_discard(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(dst_discard);
 
-const u32 dst_default_metrics[RTAX_MAX];
+const u32 dst_default_metrics[RTAX_MAX + 1] = {
+	/* This initializer is needed to force linker to place this variable
+	 * into const section. Otherwise it might end into bss section.
+	 * We really want to avoid false sharing on this variable, and catch
+	 * any writes on it.
+	 */
+	[RTAX_MAX] = 0xdeadbeef,
+};
+
 
 void *dst_alloc(struct dst_ops *ops, struct net_device *dev,
 		int initial_ref, int initial_obsolete, unsigned short flags)