Patchwork [cxx-mem-model] bitfield tests

login
register
mail settings
Submitter Aldy Hernandez
Date March 30, 2011, 2:11 p.m.
Message ID <4D933A2E.9030105@redhat.com>
Download mbox | patch
Permalink /patch/88924/
State New
Headers show

Comments

Aldy Hernandez - March 30, 2011, 2:11 p.m.
> The memory model is not implementable on strict-alignment targets
> that do not have a byte store operation.  But we previously said that ;)

Yes.  I think we should issue an error when we have such a target and 
the user tries -fmemory-model=c++0x.  However, how many strict-alignment 
targets are not byte addressable nowadays?

> Also consider global vars
>
> char a;
> char b;
>
> accessing them on strict-align targets may access adjacent globals
> (that's a problem anyway, also with alias analysis).

Good point.  I am adding a test to that effect (see attached patch).

BTW, I assume you mean strict-align targets WITHOUT byte-addressability 
as above.  I have spot-checked your scenario on a handful of important 
targets that have strict alignment, and all of them work without 
touching adjacent global vars:

	arm-elf		OK
	sparc-linux	OK
	ia64-linux	OK
	alpha-linux	OK, but only with -mbwx (byte addressability)

rth tells me that we shouldn't worry about ancient non-byte addressable 
Alphas, so the last isn't an issue.

So... do you have any important targets in mind, because I don't see 
this being a problem for most targets?  As can be expected, I am only 
interested in x86*, powerpc*, and s390, especially since a cursory 
glance on other important targets didn't exhibit any problems.  However, 
given my target bias, I am willing to look into any important targets 
that are problematic (I'm hoping none :)).

Let me know if you see anything else, and please take a quick peek at 
the attached patch below, which I will be committing shortly.

As usual, thanks.
Aldy
Richard Guenther - March 30, 2011, 2:19 p.m.
On Wed, Mar 30, 2011 at 4:11 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>> The memory model is not implementable on strict-alignment targets
>> that do not have a byte store operation.  But we previously said that ;)
>
> Yes.  I think we should issue an error when we have such a target and the
> user tries -fmemory-model=c++0x.  However, how many strict-alignment targets
> are not byte addressable nowadays?
>
>> Also consider global vars
>>
>> char a;
>> char b;
>>
>> accessing them on strict-align targets may access adjacent globals
>> (that's a problem anyway, also with alias analysis).
>
> Good point.  I am adding a test to that effect (see attached patch).
>
> BTW, I assume you mean strict-align targets WITHOUT byte-addressability as
> above.  I have spot-checked your scenario on a handful of important targets
> that have strict alignment, and all of them work without touching adjacent
> global vars:
>
>        arm-elf         OK
>        sparc-linux     OK
>        ia64-linux      OK
>        alpha-linux     OK, but only with -mbwx (byte addressability)
>
> rth tells me that we shouldn't worry about ancient non-byte addressable
> Alphas, so the last isn't an issue.
>
> So... do you have any important targets in mind, because I don't see this
> being a problem for most targets?  As can be expected, I am only interested
> in x86*, powerpc*, and s390, especially since a cursory glance on other
> important targets didn't exhibit any problems.  However, given my target
> bias, I am willing to look into any important targets that are problematic
> (I'm hoping none :)).

Well, I'm not sure that strict-align targets that provide byte access do
not simply hide the issue inside the CPU (thus, perform the read-modify-write
there and do not guarantee any atomicity unless you ask for it).  It might
be even worse - targets might not even guarantee this for shared cache-lines
(for non-ccNUMA architectures).  But I'm no expert here, but certainly
every possible weird CPU architecture has been implemented.

Richard.
Michael Matz - March 30, 2011, 2:25 p.m.
Hi,

On Wed, 30 Mar 2011, Aldy Hernandez wrote:

> 
> > The memory model is not implementable on strict-alignment targets
> > that do not have a byte store operation.  But we previously said that ;)
> 
> Yes.  I think we should issue an error when we have such a target and the user
> tries -fmemory-model=c++0x.  However, how many strict-alignment targets are
> not byte addressable nowadays?

Consider cache aliasing, where the unit of coherence (absent using atomic 
instructions) is for instance 64 bytes.  I'm not sure how the mem-model 
could be implemented without generally falling back to atomics.

Or CPU internal write buffers that could (again if there are just normal 
writes, not atomics) reorder or merge write requests.  I think also that 
would destroy guarantees that the cxx-mem-model tries to provide.


Ciao,
Michael.
Aldy Hernandez - March 30, 2011, 2:26 p.m.
>> So... do you have any important targets in mind, because I don't see this
>> being a problem for most targets?  As can be expected, I am only interested
>> in x86*, powerpc*, and s390, especially since a cursory glance on other
>> important targets didn't exhibit any problems.  However, given my target
>> bias, I am willing to look into any important targets that are problematic
>> (I'm hoping none :)).
>
> Well, I'm not sure that strict-align targets that provide byte access do
> not simply hide the issue inside the CPU (thus, perform the read-modify-write
> there and do not guarantee any atomicity unless you ask for it).  It might
> be even worse - targets might not even guarantee this for shared cache-lines
> (for non-ccNUMA architectures).  But I'm no expert here, but certainly
> every possible weird CPU architecture has been implemented.

Whoops, sorry I missed your off-list followup from yesterday (I'm 
reading mail sequentially :)):

 > Richard Guenther said:
 > strict-align targets will end up doing read-modify-write operations on
 > word-size even when accessing single bytes.  Note that some CPUs
 > have byte store operations but they usually are not guaranteed to
 > be "atomic" (thus, they simply do the read-modify-write in the CPU).
 > I am not aware of any strict-align CPU that can do atomic byte stores.
 >
 > Obvious problem when for example having multiple non-word-size
 > global vars (unless you force them to word-alignment).

I was not aware of how this played out internally.  This is certainly a 
problem.  I will hunt down hardware for at least arm, sparc, and ia64, 
and investigate.  But it may be that the only option will be to disallow 
the C++ memory model on strictly aligned hardware, or perhaps force 
word-alignment.

Is forcing word-alignment too big of a hammer, or will the users for 
these architectures be content with having no support for the C++0x 
memory model?

Aldy
Richard Guenther - March 30, 2011, 2:40 p.m.
On Wed, Mar 30, 2011 at 4:26 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>>> So... do you have any important targets in mind, because I don't see this
>>> being a problem for most targets?  As can be expected, I am only
>>> interested
>>> in x86*, powerpc*, and s390, especially since a cursory glance on other
>>> important targets didn't exhibit any problems.  However, given my target
>>> bias, I am willing to look into any important targets that are
>>> problematic
>>> (I'm hoping none :)).
>>
>> Well, I'm not sure that strict-align targets that provide byte access do
>> not simply hide the issue inside the CPU (thus, perform the
>> read-modify-write
>> there and do not guarantee any atomicity unless you ask for it).  It might
>> be even worse - targets might not even guarantee this for shared
>> cache-lines
>> (for non-ccNUMA architectures).  But I'm no expert here, but certainly
>> every possible weird CPU architecture has been implemented.
>
> Whoops, sorry I missed your off-list followup from yesterday (I'm reading
> mail sequentially :)):
>
>> Richard Guenther said:
>> strict-align targets will end up doing read-modify-write operations on
>> word-size even when accessing single bytes.  Note that some CPUs
>> have byte store operations but they usually are not guaranteed to
>> be "atomic" (thus, they simply do the read-modify-write in the CPU).
>> I am not aware of any strict-align CPU that can do atomic byte stores.
>>
>> Obvious problem when for example having multiple non-word-size
>> global vars (unless you force them to word-alignment).
>
> I was not aware of how this played out internally.  This is certainly a
> problem.  I will hunt down hardware for at least arm, sparc, and ia64, and
> investigate.  But it may be that the only option will be to disallow the C++
> memory model on strictly aligned hardware, or perhaps force word-alignment.
>
> Is forcing word-alignment too big of a hammer, or will the users for these
> architectures be content with having no support for the C++0x memory model?

I think a memory model that cannot be reasonably (read: also fast) implemented
on all HW is screwed from the start and we should simply ditch it.  Which
is because nobody will use it as you cannot rely on it when writing
portable programs or it will be hell slow.

Richard.
Mike Stump - March 30, 2011, 3:05 p.m.
On Mar 30, 2011, at 7:40 AM, Richard Guenther wrote:
>> Is forcing word-alignment too big of a hammer, or will the users for these
>> architectures be content with having no support for the C++0x memory model?
> 
> I think a memory model that cannot be reasonably (read: also fast) implemented
> on all HW is screwed from the start and we should simply ditch it.  Which
> is because nobody will use it as you cannot rely on it when writing
> portable programs or it will be hell slow.

I agree 100%.  If the standards people can't write a decent standard, they ought not write it.  I torpedoed someone refining volatile, which would have been nice to have, because people were laying tracks down the wrong way.  Nuke em from orbit I say.  Now, I'm sure we have it all wrong and the standard is entirely reasonable...  right?
Jeff Law - March 31, 2011, 2:47 p.m.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/30/11 08:19, Richard Guenther wrote:

> 
> Well, I'm not sure that strict-align targets that provide byte access do
> not simply hide the issue inside the CPU (thus, perform the read-modify-write
> there and do not guarantee any atomicity unless you ask for it).
Certainly some do this internally, but that's clearly out of our
control.  However, some really do sub-word accesses.

I even vaguely remember this being controllable by bits in page table
entries on one architecture.  You could set the bit which meant if I ask
for a byte access, then do it byte-wise, otherwise the processor would
do a read-modify-write.  Clearly this was meant to make it easier for
dealing with memory mapped devices.

Jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJNlJQWAAoJEBRtltQi2kC7t0IIAJTpXGIyWcIpWqk26ofieuLc
T7PIBagNARbqEU2NwzgjeUyH4HMhCgwnAX8T4WXg2JJRXsZwxQPmKfk0x3mn6yBV
z60TISwtx53LEnqbLQG5FIU4QLyOcBOGuAFabyVcsT07tKE/wmGjDBkypbsBhUuw
ZFNEY7jausQGkaRy1ObxL4VWejk51XvcqNU2ReqjQJUvbS9UlpTNoopMixORG6Hb
qb4LF/Fr9S9cckB3oBxy4pZrdEd7/rlAroMoRXw2JwEbGNyfc9EACKtcXbopakCu
XnPxjsf4eVYNDl5jSf3r8w70fX5vqUimyfVeQqi49IcImqXGlfd/8US1ptOgZQE=
=WMAs
-----END PGP SIGNATURE-----
Richard Guenther - March 31, 2011, 3:28 p.m.
On Thu, Mar 31, 2011 at 4:47 PM, Jeff Law <law@redhat.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 03/30/11 08:19, Richard Guenther wrote:
>
>>
>> Well, I'm not sure that strict-align targets that provide byte access do
>> not simply hide the issue inside the CPU (thus, perform the read-modify-write
>> there and do not guarantee any atomicity unless you ask for it).
> Certainly some do this internally, but that's clearly out of our
> control.

Sure.  My argument is that the memory model which guarantees
this kind of things for _any_ memory access is fundamentally flawed.
They should have simply required annotating objects which should
behave that way (and then only behave that way "per object", not
for any concurrent field accesses).

Richard.

> However, some really do sub-word accesses.
>
> I even vaguely remember this being controllable by bits in page table
> entries on one architecture.  You could set the bit which meant if I ask
> for a byte access, then do it byte-wise, otherwise the processor would
> do a read-modify-write.  Clearly this was meant to make it easier for
> dealing with memory mapped devices.
>
> Jeff
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/
>
> iQEcBAEBAgAGBQJNlJQWAAoJEBRtltQi2kC7t0IIAJTpXGIyWcIpWqk26ofieuLc
> T7PIBagNARbqEU2NwzgjeUyH4HMhCgwnAX8T4WXg2JJRXsZwxQPmKfk0x3mn6yBV
> z60TISwtx53LEnqbLQG5FIU4QLyOcBOGuAFabyVcsT07tKE/wmGjDBkypbsBhUuw
> ZFNEY7jausQGkaRy1ObxL4VWejk51XvcqNU2ReqjQJUvbS9UlpTNoopMixORG6Hb
> qb4LF/Fr9S9cckB3oBxy4pZrdEd7/rlAroMoRXw2JwEbGNyfc9EACKtcXbopakCu
> XnPxjsf4eVYNDl5jSf3r8w70fX5vqUimyfVeQqi49IcImqXGlfd/8US1ptOgZQE=
> =WMAs
> -----END PGP SIGNATURE-----
>
Richard Henderson - April 1, 2011, 4:24 p.m.
On 03/31/2011 08:28 AM, Richard Guenther wrote:
>>> Well, I'm not sure that strict-align targets that provide byte access do
>>> not simply hide the issue inside the CPU (thus, perform the read-modify-write
>>> there and do not guarantee any atomicity unless you ask for it).
>> Certainly some do this internally, but that's clearly out of our
>> control.
> 
> Sure.  My argument is that the memory model which guarantees
> this kind of things for _any_ memory access is fundamentally flawed.
> They should have simply required annotating objects which should
> behave that way (and then only behave that way "per object", not
> for any concurrent field accesses).

(0) Let's limit our discussion to cpus that are actually put into SMP systems,
    and have been manufactured in the last decade.

(1) Do we agree that all such cpus have user-level store insns with byte
    granularity.  Honestly the only non-microcontroler I ever heard of 
    without this was the original Alpha.  Which is excluded per (0).

(2) Do we agree that all such cpus have on-chip caches?

(3) Let us at this point limit our discussion to cacheable, i.e. non-I/O,
    memory.  I believe we can agree that all sorts of system-dependent stuff
    happens in memory-mapped registers.

(4) Do we agree that all such cpus transfer entire cachelines to and fro
    the memory bus?  And further that they simultaneously transfer a 
    modification mask as part of their cache coherency protocol?

(5) Do we agree that all such cpus use a byte-granular modification mask?

I'm guessing that you don't actually agree on point (5), but ... honestly,
please name the offender because I can't think of one.  For the mainstream
processors we really care about, I think every one of them Does The Right Thing.



r~
Andrew Pinski - April 1, 2011, 7:42 p.m.
On Fri, Apr 1, 2011 at 9:24 AM, Richard Henderson <rth@redhat.com> wrote:
> (1) Do we agree that all such cpus have user-level store insns with byte
>    granularity.  Honestly the only non-microcontroler I ever heard of
>    without this was the original Alpha.  Which is excluded per (0).

And SPU which is excluded per (0) based on it is not a SMP but rather
AMP as it does not share memory.

-- Pinski
Richard Guenther - April 2, 2011, 7:56 a.m.
On Fri, Apr 1, 2011 at 6:24 PM, Richard Henderson <rth@redhat.com> wrote:
> On 03/31/2011 08:28 AM, Richard Guenther wrote:
>>>> Well, I'm not sure that strict-align targets that provide byte access do
>>>> not simply hide the issue inside the CPU (thus, perform the read-modify-write
>>>> there and do not guarantee any atomicity unless you ask for it).
>>> Certainly some do this internally, but that's clearly out of our
>>> control.
>>
>> Sure.  My argument is that the memory model which guarantees
>> this kind of things for _any_ memory access is fundamentally flawed.
>> They should have simply required annotating objects which should
>> behave that way (and then only behave that way "per object", not
>> for any concurrent field accesses).
>
> (0) Let's limit our discussion to cpus that are actually put into SMP systems,
>    and have been manufactured in the last decade.
>
> (1) Do we agree that all such cpus have user-level store insns with byte
>    granularity.  Honestly the only non-microcontroler I ever heard of
>    without this was the original Alpha.  Which is excluded per (0).
>
> (2) Do we agree that all such cpus have on-chip caches?
>
> (3) Let us at this point limit our discussion to cacheable, i.e. non-I/O,
>    memory.  I believe we can agree that all sorts of system-dependent stuff
>    happens in memory-mapped registers.
>
> (4) Do we agree that all such cpus transfer entire cachelines to and fro
>    the memory bus?  And further that they simultaneously transfer a
>    modification mask as part of their cache coherency protocol?
>
> (5) Do we agree that all such cpus use a byte-granular modification mask?
>
> I'm guessing that you don't actually agree on point (5), but ... honestly,
> please name the offender because I can't think of one.  For the mainstream
> processors we really care about, I think every one of them Does The Right Thing.

Yes, we don't agree on (5).  And I can't name a CPU, but I was just guessing
that strict alignment CPUs would have such requirement to also make their
store queues simpler (no need for such mask).

Now, as of (0) I might agree to disregard the original Alpha, but as the
embedded world moves to SMP I'm not sure we can disregard
non-cache coherent NUMA setups or even CPUs without a byte store.

But well, I guess the thing I don't like about the standard is that it makes
people that have started to be somewhat aware about threading issues
_less_ aware of them by providing some "false" safety to them.  It
really smells like a standard designed for a very high-level language
where people don't have to think instead of a standard suitable for a
C family language.

Richard.

>
>
> r~
>
Aldy Hernandez - April 4, 2011, 12:56 p.m.
> But well, I guess the thing I don't like about the standard is that it makes
> people that have started to be somewhat aware about threading issues
> _less_ aware of them by providing some "false" safety to them.  It
> really smells like a standard designed for a very high-level language
> where people don't have to think instead of a standard suitable for a
> C family language.

Well, that's not exactly true.  You still need to think about threading. 
  All the standard is doing is guaranteeing that if you already have a 
data race free program, the compiler won't add additional races not 
already there.

But I'm not a C++ guy.  I am no advocate for the standard.  I'm just 
implementing stuff.  Ahem, I'm just a soldier in this war :).

Aldy
Aldy Hernandez - April 4, 2011, 12:58 p.m.
>> (5) Do we agree that all such cpus use a byte-granular modification mask?

> Now, as of (0) I might agree to disregard the original Alpha, but as the
> embedded world moves to SMP I'm not sure we can disregard
> non-cache coherent NUMA setups or even CPUs without a byte store.

As per 5, it doesn't matter if the CPU lacks a byte store, since the 
cache has a byte-granular modification mask.
Jeff Law - April 4, 2011, 6:05 p.m.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 04/02/11 01:56, Richard Guenther wrote:

> But well, I guess the thing I don't like about the standard is that it makes
> people that have started to be somewhat aware about threading issues
> _less_ aware of them by providing some "false" safety to them.  It
> really smells like a standard designed for a very high-level language
> where people don't have to think instead of a standard suitable for a
> C family language.
I agree it's unfortunate, but there's a general trend of finding ways to
get more out of less experienced programmers.  One of the ways to do
that is to simplify the problem space these guys have to look at.

For better or worse, it's a trend I see continuing indefinitely.
Obviously we're starting to get off-topic..


jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJNmgiBAAoJEBRtltQi2kC7OWIH/3pLUy3CpZ/tONDfonXuJOl8
aEotqjL6nmgyweg9poJlYy9MA0kNmCq25oj+TDE1H7w2kDVMAEeJtSxo37VPYS4+
KJtxD6l+J4KNhUbsSxE1oanI1f62Mf/1TZKziKW1AkDI7Ziszz5wwvD6jTU7QiJn
XaLm4gHvYtiwVBC5gPjVm0pqh8UZYpEiAdba9Y9WBSHUriLD0DfBcIwDbU59dlz0
1coYKJiXH5NlKUngFfR+oyO3pvGTgtJKweBcaQQCuV97nLsaOKiMRvVMQDA34afO
etua7nfBxM0JAeWu9ttNEjskFZi+ZG3oe8xtmj3IY5OhY1bzI0ARrbtu26K0/Ts=
=mOsV
-----END PGP SIGNATURE-----
Michael Matz - April 6, 2011, 3:29 p.m.
Hi,

On Mon, 4 Apr 2011, Aldy Hernandez wrote:

> 
> > > (5) Do we agree that all such cpus use a byte-granular modification mask?
> 
> > Now, as of (0) I might agree to disregard the original Alpha, but as the
> > embedded world moves to SMP I'm not sure we can disregard
> > non-cache coherent NUMA setups or even CPUs without a byte store.
> 
> As per 5, it doesn't matter if the CPU lacks a byte store, since the 
> cache has a byte-granular modification mask.

If it doesn't have byte stores there's no need for byte-granular 
modification masks :)


Ciao,
Michael.
Aldy Hernandez - April 6, 2011, 5:16 p.m.
On 04/06/11 10:29, Michael Matz wrote:
> Hi,
>
> On Mon, 4 Apr 2011, Aldy Hernandez wrote:
>
>>
>>>> (5) Do we agree that all such cpus use a byte-granular modification mask?
>>
>>> Now, as of (0) I might agree to disregard the original Alpha, but as the
>>> embedded world moves to SMP I'm not sure we can disregard
>>> non-cache coherent NUMA setups or even CPUs without a byte store.
>>
>> As per 5, it doesn't matter if the CPU lacks a byte store, since the
>> cache has a byte-granular modification mask.
>
> If it doesn't have byte stores there's no need for byte-granular
> modification masks :)

I was talking about a CPU with a byte store that is implemented in the 
microcode with a wider operation and logical operations that may touch 
adjacent fields.  If adjacent bytes were touched, the cache would be 
updated accordingly, hence the byte-granular modification mask.  That's 
my understanding anyhow.

Patch

Index: testsuite/gcc.dg/memmodel/strict-align-global.c
===================================================================
--- testsuite/gcc.dg/memmodel/strict-align-global.c	(revision 0)
+++ testsuite/gcc.dg/memmodel/strict-align-global.c	(revision 0)
@@ -0,0 +1,46 @@ 
+/* { dg-do link } */
+/* { dg-options "-O2 --param allow-packed-store-data-races=0" } */
+/* { dg-final { memmodel-gdb-test } } */
+
+#include <stdio.h>
+#include "memmodel.h"
+
+/* This test verifies writes to globals do not write to adjacent
+   globals.  This mostly happens on strict-align targets that are not
+   byte addressable (old Alphas, etc).  */
+
+char a = 0;
+char b = 77;
+
+void memmodel_other_threads() 
+{
+}
+
+int memmodel_step_verify()
+{
+  if (b != 77)
+    {
+      printf("FAIL: Unexpected value.  <b> is %d, should be 77\n", b);
+      return 1;
+    }
+  return 0;
+}
+
+/* Verify that every variable has the correct value.  */
+int memmodel_final_verify()
+{
+  int ret = memmodel_step_verify ();
+  if (a != 66)
+    {
+      printf("FAIL: Unexpected value.  <a> is %d, should be 66\n", a);
+      return 1;
+    }
+  return ret;
+}
+
+int main ()
+{
+  a = 66;
+  memmodel_done();
+  return 0;
+}