Message ID | 4D933A2E.9030105@redhat.com |
---|---|
State | New |
Headers | show |
On Wed, Mar 30, 2011 at 4:11 PM, Aldy Hernandez <aldyh@redhat.com> wrote: > >> The memory model is not implementable on strict-alignment targets >> that do not have a byte store operation. But we previously said that ;) > > Yes. I think we should issue an error when we have such a target and the > user tries -fmemory-model=c++0x. However, how many strict-alignment targets > are not byte addressable nowadays? > >> Also consider global vars >> >> char a; >> char b; >> >> accessing them on strict-align targets may access adjacent globals >> (that's a problem anyway, also with alias analysis). > > Good point. I am adding a test to that effect (see attached patch). > > BTW, I assume you mean strict-align targets WITHOUT byte-addressability as > above. I have spot-checked your scenario on a handful of important targets > that have strict alignment, and all of them work without touching adjacent > global vars: > > arm-elf OK > sparc-linux OK > ia64-linux OK > alpha-linux OK, but only with -mbwx (byte addressability) > > rth tells me that we shouldn't worry about ancient non-byte addressable > Alphas, so the last isn't an issue. > > So... do you have any important targets in mind, because I don't see this > being a problem for most targets? As can be expected, I am only interested > in x86*, powerpc*, and s390, especially since a cursory glance on other > important targets didn't exhibit any problems. However, given my target > bias, I am willing to look into any important targets that are problematic > (I'm hoping none :)). Well, I'm not sure that strict-align targets that provide byte access do not simply hide the issue inside the CPU (thus, perform the read-modify-write there and do not guarantee any atomicity unless you ask for it). It might be even worse - targets might not even guarantee this for shared cache-lines (for non-ccNUMA architectures). But I'm no expert here, but certainly every possible weird CPU architecture has been implemented. Richard.
Hi, On Wed, 30 Mar 2011, Aldy Hernandez wrote: > > > The memory model is not implementable on strict-alignment targets > > that do not have a byte store operation. But we previously said that ;) > > Yes. I think we should issue an error when we have such a target and the user > tries -fmemory-model=c++0x. However, how many strict-alignment targets are > not byte addressable nowadays? Consider cache aliasing, where the unit of coherence (absent using atomic instructions) is for instance 64 bytes. I'm not sure how the mem-model could be implemented without generally falling back to atomics. Or CPU internal write buffers that could (again if there are just normal writes, not atomics) reorder or merge write requests. I think also that would destroy guarantees that the cxx-mem-model tries to provide. Ciao, Michael.
>> So... do you have any important targets in mind, because I don't see this >> being a problem for most targets? As can be expected, I am only interested >> in x86*, powerpc*, and s390, especially since a cursory glance on other >> important targets didn't exhibit any problems. However, given my target >> bias, I am willing to look into any important targets that are problematic >> (I'm hoping none :)). > > Well, I'm not sure that strict-align targets that provide byte access do > not simply hide the issue inside the CPU (thus, perform the read-modify-write > there and do not guarantee any atomicity unless you ask for it). It might > be even worse - targets might not even guarantee this for shared cache-lines > (for non-ccNUMA architectures). But I'm no expert here, but certainly > every possible weird CPU architecture has been implemented. Whoops, sorry I missed your off-list followup from yesterday (I'm reading mail sequentially :)): > Richard Guenther said: > strict-align targets will end up doing read-modify-write operations on > word-size even when accessing single bytes. Note that some CPUs > have byte store operations but they usually are not guaranteed to > be "atomic" (thus, they simply do the read-modify-write in the CPU). > I am not aware of any strict-align CPU that can do atomic byte stores. > > Obvious problem when for example having multiple non-word-size > global vars (unless you force them to word-alignment). I was not aware of how this played out internally. This is certainly a problem. I will hunt down hardware for at least arm, sparc, and ia64, and investigate. But it may be that the only option will be to disallow the C++ memory model on strictly aligned hardware, or perhaps force word-alignment. Is forcing word-alignment too big of a hammer, or will the users for these architectures be content with having no support for the C++0x memory model? Aldy
On Wed, Mar 30, 2011 at 4:26 PM, Aldy Hernandez <aldyh@redhat.com> wrote: > >>> So... do you have any important targets in mind, because I don't see this >>> being a problem for most targets? As can be expected, I am only >>> interested >>> in x86*, powerpc*, and s390, especially since a cursory glance on other >>> important targets didn't exhibit any problems. However, given my target >>> bias, I am willing to look into any important targets that are >>> problematic >>> (I'm hoping none :)). >> >> Well, I'm not sure that strict-align targets that provide byte access do >> not simply hide the issue inside the CPU (thus, perform the >> read-modify-write >> there and do not guarantee any atomicity unless you ask for it). It might >> be even worse - targets might not even guarantee this for shared >> cache-lines >> (for non-ccNUMA architectures). But I'm no expert here, but certainly >> every possible weird CPU architecture has been implemented. > > Whoops, sorry I missed your off-list followup from yesterday (I'm reading > mail sequentially :)): > >> Richard Guenther said: >> strict-align targets will end up doing read-modify-write operations on >> word-size even when accessing single bytes. Note that some CPUs >> have byte store operations but they usually are not guaranteed to >> be "atomic" (thus, they simply do the read-modify-write in the CPU). >> I am not aware of any strict-align CPU that can do atomic byte stores. >> >> Obvious problem when for example having multiple non-word-size >> global vars (unless you force them to word-alignment). > > I was not aware of how this played out internally. This is certainly a > problem. I will hunt down hardware for at least arm, sparc, and ia64, and > investigate. But it may be that the only option will be to disallow the C++ > memory model on strictly aligned hardware, or perhaps force word-alignment. > > Is forcing word-alignment too big of a hammer, or will the users for these > architectures be content with having no support for the C++0x memory model? I think a memory model that cannot be reasonably (read: also fast) implemented on all HW is screwed from the start and we should simply ditch it. Which is because nobody will use it as you cannot rely on it when writing portable programs or it will be hell slow. Richard.
On Mar 30, 2011, at 7:40 AM, Richard Guenther wrote: >> Is forcing word-alignment too big of a hammer, or will the users for these >> architectures be content with having no support for the C++0x memory model? > > I think a memory model that cannot be reasonably (read: also fast) implemented > on all HW is screwed from the start and we should simply ditch it. Which > is because nobody will use it as you cannot rely on it when writing > portable programs or it will be hell slow. I agree 100%. If the standards people can't write a decent standard, they ought not write it. I torpedoed someone refining volatile, which would have been nice to have, because people were laying tracks down the wrong way. Nuke em from orbit I say. Now, I'm sure we have it all wrong and the standard is entirely reasonable... right?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/30/11 08:19, Richard Guenther wrote: > > Well, I'm not sure that strict-align targets that provide byte access do > not simply hide the issue inside the CPU (thus, perform the read-modify-write > there and do not guarantee any atomicity unless you ask for it). Certainly some do this internally, but that's clearly out of our control. However, some really do sub-word accesses. I even vaguely remember this being controllable by bits in page table entries on one architecture. You could set the bit which meant if I ask for a byte access, then do it byte-wise, otherwise the processor would do a read-modify-write. Clearly this was meant to make it easier for dealing with memory mapped devices. Jeff -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNlJQWAAoJEBRtltQi2kC7t0IIAJTpXGIyWcIpWqk26ofieuLc T7PIBagNARbqEU2NwzgjeUyH4HMhCgwnAX8T4WXg2JJRXsZwxQPmKfk0x3mn6yBV z60TISwtx53LEnqbLQG5FIU4QLyOcBOGuAFabyVcsT07tKE/wmGjDBkypbsBhUuw ZFNEY7jausQGkaRy1ObxL4VWejk51XvcqNU2ReqjQJUvbS9UlpTNoopMixORG6Hb qb4LF/Fr9S9cckB3oBxy4pZrdEd7/rlAroMoRXw2JwEbGNyfc9EACKtcXbopakCu XnPxjsf4eVYNDl5jSf3r8w70fX5vqUimyfVeQqi49IcImqXGlfd/8US1ptOgZQE= =WMAs -----END PGP SIGNATURE-----
On Thu, Mar 31, 2011 at 4:47 PM, Jeff Law <law@redhat.com> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 03/30/11 08:19, Richard Guenther wrote: > >> >> Well, I'm not sure that strict-align targets that provide byte access do >> not simply hide the issue inside the CPU (thus, perform the read-modify-write >> there and do not guarantee any atomicity unless you ask for it). > Certainly some do this internally, but that's clearly out of our > control. Sure. My argument is that the memory model which guarantees this kind of things for _any_ memory access is fundamentally flawed. They should have simply required annotating objects which should behave that way (and then only behave that way "per object", not for any concurrent field accesses). Richard. > However, some really do sub-word accesses. > > I even vaguely remember this being controllable by bits in page table > entries on one architecture. You could set the bit which meant if I ask > for a byte access, then do it byte-wise, otherwise the processor would > do a read-modify-write. Clearly this was meant to make it easier for > dealing with memory mapped devices. > > Jeff > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ > > iQEcBAEBAgAGBQJNlJQWAAoJEBRtltQi2kC7t0IIAJTpXGIyWcIpWqk26ofieuLc > T7PIBagNARbqEU2NwzgjeUyH4HMhCgwnAX8T4WXg2JJRXsZwxQPmKfk0x3mn6yBV > z60TISwtx53LEnqbLQG5FIU4QLyOcBOGuAFabyVcsT07tKE/wmGjDBkypbsBhUuw > ZFNEY7jausQGkaRy1ObxL4VWejk51XvcqNU2ReqjQJUvbS9UlpTNoopMixORG6Hb > qb4LF/Fr9S9cckB3oBxy4pZrdEd7/rlAroMoRXw2JwEbGNyfc9EACKtcXbopakCu > XnPxjsf4eVYNDl5jSf3r8w70fX5vqUimyfVeQqi49IcImqXGlfd/8US1ptOgZQE= > =WMAs > -----END PGP SIGNATURE----- >
On 03/31/2011 08:28 AM, Richard Guenther wrote: >>> Well, I'm not sure that strict-align targets that provide byte access do >>> not simply hide the issue inside the CPU (thus, perform the read-modify-write >>> there and do not guarantee any atomicity unless you ask for it). >> Certainly some do this internally, but that's clearly out of our >> control. > > Sure. My argument is that the memory model which guarantees > this kind of things for _any_ memory access is fundamentally flawed. > They should have simply required annotating objects which should > behave that way (and then only behave that way "per object", not > for any concurrent field accesses). (0) Let's limit our discussion to cpus that are actually put into SMP systems, and have been manufactured in the last decade. (1) Do we agree that all such cpus have user-level store insns with byte granularity. Honestly the only non-microcontroler I ever heard of without this was the original Alpha. Which is excluded per (0). (2) Do we agree that all such cpus have on-chip caches? (3) Let us at this point limit our discussion to cacheable, i.e. non-I/O, memory. I believe we can agree that all sorts of system-dependent stuff happens in memory-mapped registers. (4) Do we agree that all such cpus transfer entire cachelines to and fro the memory bus? And further that they simultaneously transfer a modification mask as part of their cache coherency protocol? (5) Do we agree that all such cpus use a byte-granular modification mask? I'm guessing that you don't actually agree on point (5), but ... honestly, please name the offender because I can't think of one. For the mainstream processors we really care about, I think every one of them Does The Right Thing. r~
On Fri, Apr 1, 2011 at 9:24 AM, Richard Henderson <rth@redhat.com> wrote: > (1) Do we agree that all such cpus have user-level store insns with byte > granularity. Honestly the only non-microcontroler I ever heard of > without this was the original Alpha. Which is excluded per (0). And SPU which is excluded per (0) based on it is not a SMP but rather AMP as it does not share memory. -- Pinski
On Fri, Apr 1, 2011 at 6:24 PM, Richard Henderson <rth@redhat.com> wrote: > On 03/31/2011 08:28 AM, Richard Guenther wrote: >>>> Well, I'm not sure that strict-align targets that provide byte access do >>>> not simply hide the issue inside the CPU (thus, perform the read-modify-write >>>> there and do not guarantee any atomicity unless you ask for it). >>> Certainly some do this internally, but that's clearly out of our >>> control. >> >> Sure. My argument is that the memory model which guarantees >> this kind of things for _any_ memory access is fundamentally flawed. >> They should have simply required annotating objects which should >> behave that way (and then only behave that way "per object", not >> for any concurrent field accesses). > > (0) Let's limit our discussion to cpus that are actually put into SMP systems, > and have been manufactured in the last decade. > > (1) Do we agree that all such cpus have user-level store insns with byte > granularity. Honestly the only non-microcontroler I ever heard of > without this was the original Alpha. Which is excluded per (0). > > (2) Do we agree that all such cpus have on-chip caches? > > (3) Let us at this point limit our discussion to cacheable, i.e. non-I/O, > memory. I believe we can agree that all sorts of system-dependent stuff > happens in memory-mapped registers. > > (4) Do we agree that all such cpus transfer entire cachelines to and fro > the memory bus? And further that they simultaneously transfer a > modification mask as part of their cache coherency protocol? > > (5) Do we agree that all such cpus use a byte-granular modification mask? > > I'm guessing that you don't actually agree on point (5), but ... honestly, > please name the offender because I can't think of one. For the mainstream > processors we really care about, I think every one of them Does The Right Thing. Yes, we don't agree on (5). And I can't name a CPU, but I was just guessing that strict alignment CPUs would have such requirement to also make their store queues simpler (no need for such mask). Now, as of (0) I might agree to disregard the original Alpha, but as the embedded world moves to SMP I'm not sure we can disregard non-cache coherent NUMA setups or even CPUs without a byte store. But well, I guess the thing I don't like about the standard is that it makes people that have started to be somewhat aware about threading issues _less_ aware of them by providing some "false" safety to them. It really smells like a standard designed for a very high-level language where people don't have to think instead of a standard suitable for a C family language. Richard. > > > r~ >
> But well, I guess the thing I don't like about the standard is that it makes > people that have started to be somewhat aware about threading issues > _less_ aware of them by providing some "false" safety to them. It > really smells like a standard designed for a very high-level language > where people don't have to think instead of a standard suitable for a > C family language. Well, that's not exactly true. You still need to think about threading. All the standard is doing is guaranteeing that if you already have a data race free program, the compiler won't add additional races not already there. But I'm not a C++ guy. I am no advocate for the standard. I'm just implementing stuff. Ahem, I'm just a soldier in this war :). Aldy
>> (5) Do we agree that all such cpus use a byte-granular modification mask? > Now, as of (0) I might agree to disregard the original Alpha, but as the > embedded world moves to SMP I'm not sure we can disregard > non-cache coherent NUMA setups or even CPUs without a byte store. As per 5, it doesn't matter if the CPU lacks a byte store, since the cache has a byte-granular modification mask.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 04/02/11 01:56, Richard Guenther wrote: > But well, I guess the thing I don't like about the standard is that it makes > people that have started to be somewhat aware about threading issues > _less_ aware of them by providing some "false" safety to them. It > really smells like a standard designed for a very high-level language > where people don't have to think instead of a standard suitable for a > C family language. I agree it's unfortunate, but there's a general trend of finding ways to get more out of less experienced programmers. One of the ways to do that is to simplify the problem space these guys have to look at. For better or worse, it's a trend I see continuing indefinitely. Obviously we're starting to get off-topic.. jeff -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNmgiBAAoJEBRtltQi2kC7OWIH/3pLUy3CpZ/tONDfonXuJOl8 aEotqjL6nmgyweg9poJlYy9MA0kNmCq25oj+TDE1H7w2kDVMAEeJtSxo37VPYS4+ KJtxD6l+J4KNhUbsSxE1oanI1f62Mf/1TZKziKW1AkDI7Ziszz5wwvD6jTU7QiJn XaLm4gHvYtiwVBC5gPjVm0pqh8UZYpEiAdba9Y9WBSHUriLD0DfBcIwDbU59dlz0 1coYKJiXH5NlKUngFfR+oyO3pvGTgtJKweBcaQQCuV97nLsaOKiMRvVMQDA34afO etua7nfBxM0JAeWu9ttNEjskFZi+ZG3oe8xtmj3IY5OhY1bzI0ARrbtu26K0/Ts= =mOsV -----END PGP SIGNATURE-----
Hi, On Mon, 4 Apr 2011, Aldy Hernandez wrote: > > > > (5) Do we agree that all such cpus use a byte-granular modification mask? > > > Now, as of (0) I might agree to disregard the original Alpha, but as the > > embedded world moves to SMP I'm not sure we can disregard > > non-cache coherent NUMA setups or even CPUs without a byte store. > > As per 5, it doesn't matter if the CPU lacks a byte store, since the > cache has a byte-granular modification mask. If it doesn't have byte stores there's no need for byte-granular modification masks :) Ciao, Michael.
On 04/06/11 10:29, Michael Matz wrote: > Hi, > > On Mon, 4 Apr 2011, Aldy Hernandez wrote: > >> >>>> (5) Do we agree that all such cpus use a byte-granular modification mask? >> >>> Now, as of (0) I might agree to disregard the original Alpha, but as the >>> embedded world moves to SMP I'm not sure we can disregard >>> non-cache coherent NUMA setups or even CPUs without a byte store. >> >> As per 5, it doesn't matter if the CPU lacks a byte store, since the >> cache has a byte-granular modification mask. > > If it doesn't have byte stores there's no need for byte-granular > modification masks :) I was talking about a CPU with a byte store that is implemented in the microcode with a wider operation and logical operations that may touch adjacent fields. If adjacent bytes were touched, the cache would be updated accordingly, hence the byte-granular modification mask. That's my understanding anyhow.
Index: testsuite/gcc.dg/memmodel/strict-align-global.c =================================================================== --- testsuite/gcc.dg/memmodel/strict-align-global.c (revision 0) +++ testsuite/gcc.dg/memmodel/strict-align-global.c (revision 0) @@ -0,0 +1,46 @@ +/* { dg-do link } */ +/* { dg-options "-O2 --param allow-packed-store-data-races=0" } */ +/* { dg-final { memmodel-gdb-test } } */ + +#include <stdio.h> +#include "memmodel.h" + +/* This test verifies writes to globals do not write to adjacent + globals. This mostly happens on strict-align targets that are not + byte addressable (old Alphas, etc). */ + +char a = 0; +char b = 77; + +void memmodel_other_threads() +{ +} + +int memmodel_step_verify() +{ + if (b != 77) + { + printf("FAIL: Unexpected value. <b> is %d, should be 77\n", b); + return 1; + } + return 0; +} + +/* Verify that every variable has the correct value. */ +int memmodel_final_verify() +{ + int ret = memmodel_step_verify (); + if (a != 66) + { + printf("FAIL: Unexpected value. <a> is %d, should be 66\n", a); + return 1; + } + return ret; +} + +int main () +{ + a = 66; + memmodel_done(); + return 0; +}