Message ID | 1443558863-26132-2-git-send-email-ehabkost@redhat.com |
---|---|
State | New |
Headers | show |
On 29/09/2015 22:34, Eduardo Habkost wrote: > Fix undefined behavior detected by clang runtime check: > > qemu/target-i386/cpu.c:1494:15: runtime error: > left shift of 1 by 31 places cannot be represented in type 'int' > > While doing that, add extra parenthesis for clarity. > > Reported-by: Peter Maydell <peter.maydell@linaro.org> > Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> > --- > target-i386/cpu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/target-i386/cpu.c b/target-i386/cpu.c > index 2b914b2..6af6db9 100644 > --- a/target-i386/cpu.c > +++ b/target-i386/cpu.c > @@ -1491,7 +1491,7 @@ static void report_unavailable_features(FeatureWord w, uint32_t mask) > int i; > > for (i = 0; i < 32; ++i) { > - if (1 << i & mask) { > + if ((1UL << i) & mask) { 1U is enough. Paolo ps: Ego ceterum censeo that these warnings are useless and uglify the code unnecessarily. But it looks like I'm in a minority so the patch is okay. > const char *reg = get_register_name_32(f->cpuid_reg); > assert(reg); > fprintf(stderr, "warning: %s doesn't support requested feature: " >
On 09/30/2015 11:27 PM, Paolo Bonzini wrote: > > > On 29/09/2015 22:34, Eduardo Habkost wrote: >> Fix undefined behavior detected by clang runtime check: >> >> qemu/target-i386/cpu.c:1494:15: runtime error: >> left shift of 1 by 31 places cannot be represented in type 'int' >> >> While doing that, add extra parenthesis for clarity. >> >> Reported-by: Peter Maydell <peter.maydell@linaro.org> >> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> >> --- >> target-i386/cpu.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/target-i386/cpu.c b/target-i386/cpu.c >> index 2b914b2..6af6db9 100644 >> --- a/target-i386/cpu.c >> +++ b/target-i386/cpu.c >> @@ -1491,7 +1491,7 @@ static void report_unavailable_features(FeatureWord w, uint32_t mask) >> int i; >> >> for (i = 0; i < 32; ++i) { >> - if (1 << i & mask) { >> + if ((1UL << i) & mask) { > > 1U is enough. > > Paolo > > ps: Ego ceterum censeo that these warnings are useless and uglify the > code unnecessarily. But it looks like I'm in a minority so the patch is > okay. I totally agree. There are no ones-compliment machines anymore, and so the whole point of that "undefined" in the C standard is moot. Let's all accept that shifts of signed quantities do exactly what we expect. Without looking, I don't suppose either compiler has a switch to disable just the shift part of ubsan? r~
On 30/09/2015 22:24, Richard Henderson wrote: > On 09/30/2015 11:27 PM, Paolo Bonzini wrote: >> >> >> On 29/09/2015 22:34, Eduardo Habkost wrote: >>> Fix undefined behavior detected by clang runtime check: >>> >>> qemu/target-i386/cpu.c:1494:15: runtime error: >>> left shift of 1 by 31 places cannot be represented in type 'int' >>> >>> While doing that, add extra parenthesis for clarity. >>> >>> Reported-by: Peter Maydell <peter.maydell@linaro.org> >>> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> >>> --- >>> target-i386/cpu.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/target-i386/cpu.c b/target-i386/cpu.c >>> index 2b914b2..6af6db9 100644 >>> --- a/target-i386/cpu.c >>> +++ b/target-i386/cpu.c >>> @@ -1491,7 +1491,7 @@ static void >>> report_unavailable_features(FeatureWord w, uint32_t mask) >>> int i; >>> >>> for (i = 0; i < 32; ++i) { >>> - if (1 << i & mask) { >>> + if ((1UL << i) & mask) { >> >> 1U is enough. >> >> Paolo >> >> ps: Ego ceterum censeo that these warnings are useless and uglify the >> code unnecessarily. But it looks like I'm in a minority so the patch is >> okay. > > I totally agree. There are no ones-compliment machines anymore, and so > the whole point of that "undefined" in the C standard is moot. Let's > all accept that shifts of signed quantities do exactly what we expect. > > Without looking, I don't suppose either compiler has a switch to disable > just the shift part of ubsan? Nope, I already asked. :) Paolo
On 30 September 2015 at 21:24, Richard Henderson <rth@twiddle.net> wrote: > On 09/30/2015 11:27 PM, Paolo Bonzini wrote: >> ps: Ego ceterum censeo that these warnings are useless and uglify the >> code unnecessarily. But it looks like I'm in a minority so the patch is >> okay. > I totally agree. There are no ones-compliment machines anymore, and so the > whole point of that "undefined" in the C standard is moot. Let's all accept > that shifts of signed quantities do exactly what we expect. I'd rather not do that without a documented statement from both clang and gcc teams that they won't use this UB to do optimizations that might break programs relying on it. History suggests they will happily do so if it improves a benchmark at all. Until the toolchain implementers and the C standards bodies define "friendly C" for us, we're stuck with the language we have. > Without looking, I don't suppose either compiler has a switch to disable > just the shift part of ubsan? Not without turning off other shift checks which we would want to retain (like shifts greater than the bitwidth), I think. thanks -- PMM
On 01/10/2015 11:24, Peter Maydell wrote: > On 30 September 2015 at 21:24, Richard Henderson <rth@twiddle.net> wrote: >> On 09/30/2015 11:27 PM, Paolo Bonzini wrote: >>> ps: Ego ceterum censeo that these warnings are useless and uglify the >>> code unnecessarily. But it looks like I'm in a minority so the patch is >>> okay. > >> I totally agree. There are no ones-compliment machines anymore, and so the >> whole point of that "undefined" in the C standard is moot. Let's all accept >> that shifts of signed quantities do exactly what we expect. > > I'd rather not do that without a documented statement from both > clang and gcc teams that they won't use this UB to do optimizations > that might break programs relying on it. History suggests they > will happily do so if it improves a benchmark at all. Well, this is pretty much the only ubsan issue that we stumble upon. You can imagine how common that is in the wild and how good a move that would be to rely on that undefined behavior. In addition, C89 didn't say at all what the result was for signed data types, so technically we could compile QEMU with -std=gnu89 (the default until GCC5) and call it a day. Really the C standard should make this implementation-defined. >> Without looking, I don't suppose either compiler has a switch to disable >> just the shift part of ubsan? > > Not without turning off other shift checks which we would want to > retain (like shifts greater than the bitwidth), I think. I agree those are valuable. Paolo
On 10/01/15 15:52, Paolo Bonzini wrote: > > > On 01/10/2015 11:24, Peter Maydell wrote: >> On 30 September 2015 at 21:24, Richard Henderson <rth@twiddle.net> wrote: >>> On 09/30/2015 11:27 PM, Paolo Bonzini wrote: >>>> ps: Ego ceterum censeo that these warnings are useless and uglify the >>>> code unnecessarily. But it looks like I'm in a minority so the patch is >>>> okay. >> >>> I totally agree. There are no ones-compliment machines anymore, and so the >>> whole point of that "undefined" in the C standard is moot. Let's all accept >>> that shifts of signed quantities do exactly what we expect. >> >> I'd rather not do that without a documented statement from both >> clang and gcc teams that they won't use this UB to do optimizations >> that might break programs relying on it. History suggests they >> will happily do so if it improves a benchmark at all. > > Well, this is pretty much the only ubsan issue that we stumble upon. > You can imagine how common that is in the wild and how good a move that > would be to rely on that undefined behavior. > > In addition, C89 didn't say at all what the result was for signed data > types, so technically we could compile QEMU with -std=gnu89 (the default > until GCC5) and call it a day. > > Really the C standard should make this implementation-defined. Obligatory link: http://blog.regehr.org/archives/1180 :) >>> Without looking, I don't suppose either compiler has a switch to disable >>> just the shift part of ubsan? >> >> Not without turning off other shift checks which we would want to >> retain (like shifts greater than the bitwidth), I think. > > I agree those are valuable. > > Paolo >
On 01/10/2015 19:07, Laszlo Ersek wrote: > > In addition, C89 didn't say at all what the result was for signed data > > types, so technically we could compile QEMU with -std=gnu89 (the default > > until GCC5) and call it a day. > > > > Really the C standard should make this implementation-defined. > > Obligatory link: http://blog.regehr.org/archives/1180 Many ideas in there are good (e.g. mem*() being defined for invalid argument and zero lengths, and of course item 7 which is the issue at hand). In many cases it's also good to change undefined behavior to unspecified values, however I think that goes too far. For example I'm okay with signed integer overflow being undefined behavior, and I also disagree with "It is permissible to compute out-of-bounds pointer values including performing pointer arithmetic on the null pointer". Using uintptr_t is just fine. Also strict aliasing improves performance noticeably at least on some kind of code. The relaxation of strict aliasing that GCC does with unions would be a useful addition to the C standard, though. Paolo
On 1 October 2015 at 18:30, Paolo Bonzini <pbonzini@redhat.com> wrote: > > > On 01/10/2015 19:07, Laszlo Ersek wrote: >> > In addition, C89 didn't say at all what the result was for signed data >> > types, so technically we could compile QEMU with -std=gnu89 (the default >> > until GCC5) and call it a day. >> > >> > Really the C standard should make this implementation-defined. >> >> Obligatory link: http://blog.regehr.org/archives/1180 > > Many ideas in there are good (e.g. mem*() being defined for invalid > argument and zero lengths, and of course item 7 which is the issue at > hand). In many cases it's also good to change undefined behavior to > unspecified values, however I think that goes too far. > > For example I'm okay with signed integer overflow being undefined > behavior, and I also disagree with "It is permissible to compute > out-of-bounds pointer values including performing pointer arithmetic on > the null pointer". Using uintptr_t is just fine. I bet you QEMU breaks the 'out of bounds pointer arithmetic' rule all over the place. (set_prop_arraylen(), for a concrete example off the top of my head.) Signed integer overflow being UB is a really terrible idea which is one of the core cases for nailing down the UB -- everybody expects signed integers to behave as 2s-complement, when in fact what the compiler can and will do currently is just do totally unpredictable things... > Also strict aliasing improves performance noticeably at least on some > kind of code. The relaxation of strict aliasing that GCC does with > unions would be a useful addition to the C standard, though. QEMU currently turns off strict-aliasing entirely, which I think is entirely sensible of us. A lot of the underlying intention behind the proposal (as I interpret it) is "consistency and predictability of behaviour for the programmer trumps pure performance". That sounds like a good idea to me. thanks -- PMM
On 10/01/15 19:30, Paolo Bonzini wrote: > > > On 01/10/2015 19:07, Laszlo Ersek wrote: >>> In addition, C89 didn't say at all what the result was for signed data >>> types, so technically we could compile QEMU with -std=gnu89 (the default >>> until GCC5) and call it a day. >>> >>> Really the C standard should make this implementation-defined. >> >> Obligatory link: http://blog.regehr.org/archives/1180 > > Many ideas in there are good (e.g. mem*() being defined for invalid > argument and zero lengths, and of course item 7 which is the issue at > hand). In many cases it's also good to change undefined behavior to > unspecified values, however I think that goes too far. > > For example I'm okay with signed integer overflow being undefined > behavior, and I also disagree with "It is permissible to compute > out-of-bounds pointer values including performing pointer arithmetic on > the null pointer". Using uintptr_t is just fine. > > Also strict aliasing improves performance noticeably at least on some > kind of code. The relaxation of strict aliasing that GCC does with > unions would be a useful addition to the C standard, though. What do you mean under "relaxation of strict aliasing that GCC does with unions"? I believe I know how unions affect this (although for details I'd obviously have to consult the standard :)), but what are the gcc specific parts? Thanks! Laszlo
On 10/01/15 19:38, Peter Maydell wrote: > On 1 October 2015 at 18:30, Paolo Bonzini <pbonzini@redhat.com> wrote: >> >> >> On 01/10/2015 19:07, Laszlo Ersek wrote: >>>> In addition, C89 didn't say at all what the result was for signed data >>>> types, so technically we could compile QEMU with -std=gnu89 (the default >>>> until GCC5) and call it a day. >>>> >>>> Really the C standard should make this implementation-defined. >>> >>> Obligatory link: http://blog.regehr.org/archives/1180 >> >> Many ideas in there are good (e.g. mem*() being defined for invalid >> argument and zero lengths, and of course item 7 which is the issue at >> hand). In many cases it's also good to change undefined behavior to >> unspecified values, however I think that goes too far. >> >> For example I'm okay with signed integer overflow being undefined >> behavior, and I also disagree with "It is permissible to compute >> out-of-bounds pointer values including performing pointer arithmetic on >> the null pointer". Using uintptr_t is just fine. > > I bet you QEMU breaks the 'out of bounds pointer arithmetic' > rule all over the place. (set_prop_arraylen(), for a concrete > example off the top of my head.) > > Signed integer overflow being UB is a really terrible idea which > is one of the core cases for nailing down the UB -- everybody > expects signed integers to behave as 2s-complement, when in > fact what the compiler can and will do currently is just do totally > unpredictable things... > >> Also strict aliasing improves performance noticeably at least on some >> kind of code. The relaxation of strict aliasing that GCC does with >> unions would be a useful addition to the C standard, though. > > QEMU currently turns off strict-aliasing entirely, which I think > is entirely sensible of us. Hm, I didn't know that. Indeed it is part of QEMU_CFLAGS. Another example: the kernel. In the top Makefile, KBUILD_CFLAGS gets -fno-strict-aliasing. And according to "Documentation/kbuild/makefiles.txt", "... the top Makefile owns the variable $(KBUILD_CFLAGS) and uses it for compilation flags for the entire tree". Yet another example: edk2. (See "BaseTools/Conf/tools_def.template", GCC_ALL_CC_FLAGS.) > A lot of the underlying intention behind the proposal (as I > interpret it) is "consistency and predictability of behaviour > for the programmer trumps pure performance". That sounds like > a good idea to me. I once spent an afternoon interpreting the "effective type" paragraphs in the C standard ("6.5 Expressions", paragraphs 6 and 7). They make sense, and it is possible to write conformant code. Here's an example: - In the firmware, allocate an array of bytes, dynamically. This array will have no declared type. - Populate the array byte-wise, from fw_cfg. Because the stores happen through character-typed lvalues, they do not "imbue" the target object with any effective type, for further accesses that do not modify the value. (I.e., for further reads.) - Get a (uint8_t*) into the array somewhere, and cast it to (struct acpi_table_hdr *). Read fields through the cast pointer. Assuming no out-of-bounds situation (considering the entire pointed to acpi_table_hdr struct), and assuming no alignment violations for the fields (which is implementation-defined), these accesses will be fine. *However*. If in point 2 you populate the array with uint64_t accesses, that *does* imbue the array elements with an effective type that is binding for further read accesses. And, in step 3, because the ACPI table header struct does not include uint64_t fields, those accesses will be undefined behavior. ... I don't know who on earth has brain capacity for tracking this. Effective type *does* propagate in a trackable manner, but it is one order of magnitude harder to follow for humans than integer conversions -- and resultant ranges -- are (and those are hard enough already!). So, it would be nice and prudent to comply with the effective type / strict aliasing rules, and allow the compiler to optimize "more", but personally I think I wouldn't be able to track effective type *realiably* (despite being fully conscious of integer promotions and conversions, for example). Therefore, I embrace -fno-strict-aliasing. Thanks Laszlo
Peter Maydell <peter.maydell@linaro.org> writes: > On 1 October 2015 at 18:30, Paolo Bonzini <pbonzini@redhat.com> wrote: >> >> >> On 01/10/2015 19:07, Laszlo Ersek wrote: >>> > In addition, C89 didn't say at all what the result was for signed data >>> > types, so technically we could compile QEMU with -std=gnu89 (the default >>> > until GCC5) and call it a day. >>> > >>> > Really the C standard should make this implementation-defined. >>> >>> Obligatory link: http://blog.regehr.org/archives/1180 >> >> Many ideas in there are good (e.g. mem*() being defined for invalid >> argument and zero lengths, and of course item 7 which is the issue at >> hand). In many cases it's also good to change undefined behavior to >> unspecified values, however I think that goes too far. >> >> For example I'm okay with signed integer overflow being undefined >> behavior, and I also disagree with "It is permissible to compute >> out-of-bounds pointer values including performing pointer arithmetic on >> the null pointer". Using uintptr_t is just fine. > > I bet you QEMU breaks the 'out of bounds pointer arithmetic' > rule all over the place. (set_prop_arraylen(), for a concrete > example off the top of my head.) > > Signed integer overflow being UB is a really terrible idea which > is one of the core cases for nailing down the UB -- everybody > expects signed integers to behave as 2s-complement, when in > fact what the compiler can and will do currently is just do totally > unpredictable things... > >> Also strict aliasing improves performance noticeably at least on some >> kind of code. The relaxation of strict aliasing that GCC does with >> unions would be a useful addition to the C standard, though. > > QEMU currently turns off strict-aliasing entirely, which I think > is entirely sensible of us. > > A lot of the underlying intention behind the proposal (as I > interpret it) is "consistency and predictability of behaviour > for the programmer trumps pure performance". That sounds like > a good idea to me. We do not have a raging "oh my god the compiler can't sufficiently optimize" crisis. We do have a raging "we can't get our software sufficiently reliable" crisis.
On 01/10/2015 21:17, Laszlo Ersek wrote: > - In the firmware, allocate an array of bytes, dynamically. This array > will have no declared type. > > - Populate the array byte-wise, from fw_cfg. Because the stores happen > through character-typed lvalues, they do not "imbue" the target > object with any effective type, for further accesses that do not > modify the value. (I.e., for further reads.) > > - Get a (uint8_t*) into the array somewhere, and cast it to > (struct acpi_table_hdr *). Read fields through the cast pointer. > Assuming no out-of-bounds situation (considering the entire > pointed to acpi_table_hdr struct), and assuming no alignment > violations for the fields (which is implementation-defined), these > accesses will be fine. > > *However*. If in point 2 you populate the array with uint64_t accesses, > that *does* imbue the array elements with an effective type that is > binding for further read accesses. Then don't do it. Use memcpy from uint64_t to the array. Type punning has other problems than aliasing---for example some architectures require pointers to be correctly aligned when accessing objects bigger than a byte. > ... I don't know who on earth has brain capacity for tracking this. If you can't understand a rule (or understanding it burns too much of your brain cycles), just find a pattern that lets you respect it without much thought. For strict aliasing it's just "don't cast pointer types" with a single exception, namely casting a pointer to struct to a pointer to the first member's type and the other way round. Everything else can either be expressed as container_of, or simply prohibited. > Effective type *does* propagate in a trackable manner, but it is one > order of magnitude harder to follow for humans than integer conversions > -- and resultant ranges -- are (and those are hard enough already!). Integer conversions are already too much for me, in fact. Here my pattern there is just: 1) use uint16_t as sparsely as possible (because the result of a multiplication can overflow, unlike uint8_t); 2) never write unsigned int constants---this doesn't apply to unsigned long long constants, which instead I use liberally; 3) rely heavily on Coverity to detect narrow types being used as {,u}int64_t after arithmetic has been done on int. Never writing unsigned int constants conflicts heavily with this ubsan rule. And I can always use the excuse that I'm writing gnu89 code rather than c99. :) Paolo
On 01/10/2015 20:40, Laszlo Ersek wrote: > > Also strict aliasing improves performance noticeably at least on some > > kind of code. The relaxation of strict aliasing that GCC does with > > unions would be a useful addition to the C standard, though. > > What do you mean under "relaxation of strict aliasing that GCC does with > unions"? I believe I know how unions affect this (although for details > I'd obviously have to consult the standard :)), but what are the gcc > specific parts? I remembered wrong---it's not a relaxation of strict aliasing, it's defining what happens when a member of a union is accessed through a member of a different type. C89 makes that implementation-defined (3.3.2.3) and GCC defines the behavior the sane way: "the relevant bytes of the representation of the object are treated as an object of the type used for the access". C99 makes the other members "take unspecified values" (6.2.6.1). I have always thought this to be weaker than GCC's promise, but found out that a subsequent TR added a footnote to clarify that the desired behavior is the sane one too. C11 also has the same footnote, and no other change in this area. Paolo
On 10/02/15 10:34, Paolo Bonzini wrote: > > > On 01/10/2015 21:17, Laszlo Ersek wrote: >> - In the firmware, allocate an array of bytes, dynamically. This array >> will have no declared type. >> >> - Populate the array byte-wise, from fw_cfg. Because the stores happen >> through character-typed lvalues, they do not "imbue" the target >> object with any effective type, for further accesses that do not >> modify the value. (I.e., for further reads.) >> >> - Get a (uint8_t*) into the array somewhere, and cast it to >> (struct acpi_table_hdr *). Read fields through the cast pointer. >> Assuming no out-of-bounds situation (considering the entire >> pointed to acpi_table_hdr struct), and assuming no alignment >> violations for the fields (which is implementation-defined), these >> accesses will be fine. >> >> *However*. If in point 2 you populate the array with uint64_t accesses, >> that *does* imbue the array elements with an effective type that is >> binding for further read accesses. > > Then don't do it. Use memcpy from uint64_t to the array. It won't work; memcpy() propagates the effective type. (So does a loop that copies char-wise.) From 6.5p7: If a value is copied into an object having no declared type using /memcpy/ or /memmove/, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. > Type punning > has other problems than aliasing---for example some architectures > require pointers to be correctly aligned when accessing objects bigger > than a byte. I definitely agree that alignment is a question to be considered; in fact the C standard speaks about pointer alignment with reference to pointer *conversion*, not pointer dereferencing. (In other words, if you cast a pointer-to-typeA to pointer-to-typeB, and the result is not correctly aligned for typeB, then you get undefined behavior right at the conversion, regardless of dereferencing the cast pointer later or not.) 6.3.2.3p7: A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned (footnote 57) for the pointed-to type, the behavior is undefined. [...] ( Footnote 57: In general, the concept ‘‘correctly aligned’’ is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C. ) However, alignment requirements are implementation-defined, and if you know your platform (and in low level code you frequently do), you can cover that base. > >> ... I don't know who on earth has brain capacity for tracking this. > > If you can't understand a rule (or understanding it burns too much of > your brain cycles), just find a pattern that lets you respect it without > much thought. I understand the rule just fine. *Applying* the rule consistently is very costly. > For strict aliasing it's just "don't cast pointer types" > with a single exception, namely casting a pointer to struct to a pointer > to the first member's type and the other way round. Everything else can > either be expressed as container_of, or simply prohibited. How about parsing network data? Assume that you have verified the size of the uint8_t buffer, and that you have verified the message type with direct *(uint8_t*) access. Assume further that you have a sequence of such messages concatenated, each with a different type (implying a different struct). I don't see how unions would apply here with any flexibility. One thing that would certainly work is to define one separate local variable for each possible message type (ie. struct) -- or create one union variable that can contain each one of those structs --, and then memcpy() data from the buffer into the right variable (or union member) before accessing the message fields in that variable / struct. The constant copying is incredibly annoying, compared to advancing a uint8_t* pointer in the buffer, checking remaining buffer size, and casting. (I wonder if reinterpret_cast<>() would allow this in C++, BTW.) > >> Effective type *does* propagate in a trackable manner, but it is one >> order of magnitude harder to follow for humans than integer conversions >> -- and resultant ranges -- are (and those are hard enough already!). > > Integer conversions are already too much for me, in fact. > > Here my pattern there is just: 1) use uint16_t as sparsely as possible > (because the result of a multiplication can overflow, unlike uint8_t); Okay, I think I can follow that... > 2) never write unsigned int constants Ah, so this is where it comes from! :) So, I guess the idea is that you'd like to stay in "int" as much as possible. (And, with respect to the above point, both uint8_t and uint16_t are promoted to int (=== int32_t), on all platforms that matter.) >---this doesn't apply to unsigned > long long constants, which instead I use liberally; 3) rely heavily on > Coverity to detect narrow types being used as {,u}int64_t after > arithmetic has been done on int. I see -- Coverity should help to identify expressions where the cast should be done before doing the arithmetic. > > Never writing unsigned int constants conflicts heavily with this ubsan > rule. And I can always use the excuse that I'm writing gnu89 code > rather than c99. :) I see. In comparison, I'm a huge fan of unsigned-only, both in variables / fields and in constants. :) One random example: (a - b). If "a" and "b" are unsigned, then (1) wrapping is well-defined, (2) if you don't want it, it is easy to detect beforehand: if (a < b) { no can do } However, if "a" and "b" are both signed ints, then (1) wrapping in the subtraction is undefined behavior, (2) if you want to prevent it, the failure condition *in math* (not in C) is: (a - b) < INT_MIN || (a - b) > INT_MAX How can we evaluate that? *** If (a >= b), then the first condition is false, and the second condition decides. The second condition is equivalent to both of: a > INT_MAX + b [i] a - INT_MAX > b [ii] [i] Since adding INT_MAX, a positive value, increases the sum, the expression (INT_MAX + b) can be evaluated if (INT_MAX + b) <= INT_MAX in other words, if b <= 0 [ii] Otherwise, (a - INT_MAX) can always be evaluated. This is because the requirement is a - INT_MAX >= INT_MIN (since subtracting a positive value decreases the sum), which is equivalent to a >= INT_MIN + INT_MAX Now, we have (b > 0) here -- this is the "otherwise" branch of [i] --, and our starting condition was (a >= b), therefore we have (a > 0), which implies the above. Thus, for the (a >= b) case, we have, as failure condition: (b <= 0) ? (a > INT_MAX + b) : (a - INT_MAX > b) **** If (a < b), then the second condition is false, and the first condition decides. The first condition is equivalent to both of: a < INT_MIN + b [iii] a - INT_MIN < b [iv] [iii] Since adding INT_MIN, a negative value, decreases the sum, the expression (INT_MIN + b) can be evaluated if: (INT_MIN + b) >= INT_MIN in other words, if b >= 0 [iv] Otherwise, since subtracting INT_MIN, a negative value, increases the sum, the expression (a - INT_MIN) can be evaluated if: (a - INT_MIN) <= INT_MAX which is equivalent to a <= INT_MAX + INT_MIN Now, we have (b < 0) here -- we're on the "otherwise" branch of [iii] --, plus our starting condition was (a < b). Those together mean a < -1 which guarantees our requirement a <= INT_MAX + INT_MIN Therefore (a - INT_MIN) can always be evaluated on this branch. Leaving us with the following failure condition, for the (a < b) case: (b >= 0) ? (a < INT_MIN + b) : (a - INT_MIN < b) **** The final condition is (a >= b) ? ((b <= 0) ? (a > INT_MAX + b) : (a - INT_MAX > b)) : ((b >= 0) ? (a < INT_MIN + b) : (a - INT_MIN < b)) If this evaluates to true, then (a - b) would be undefined behavior. ... Given that we almost never need negative integer values, I'd rather stick with unsigned variables, unsigned constants, and write (a<b), in order to check against wrapping, than use the above monstrosity. (If I messed up the calculation, that only strengthens my point.) Sure, we can always cast to int64_t first... and if we're subtracting int64_t, we can always cast to Int128 first... :P Laszlo
On 02/10/2015 13:14, Laszlo Ersek wrote: > On 10/02/15 10:34, Paolo Bonzini wrote: >> On 01/10/2015 21:17, Laszlo Ersek wrote: >>> - In the firmware, allocate an array of bytes, dynamically. This array >>> will have no declared type. >>> >>> - Populate the array byte-wise, from fw_cfg. Because the stores happen >>> through character-typed lvalues, they do not "imbue" the target >>> object with any effective type, for further accesses that do not >>> modify the value. (I.e., for further reads.) >>> >>> - Get a (uint8_t*) into the array somewhere, and cast it to >>> (struct acpi_table_hdr *). Read fields through the cast pointer. >>> Assuming no out-of-bounds situation (considering the entire >>> pointed to acpi_table_hdr struct), and assuming no alignment >>> violations for the fields (which is implementation-defined), these >>> accesses will be fine. >>> >>> *However*. If in point 2 you populate the array with uint64_t accesses, >>> that *does* imbue the array elements with an effective type that is >>> binding for further read accesses. >> >> Then don't do it. Use memcpy from uint64_t to the array. > > It won't work; memcpy() propagates the effective type. Doh. I guess that's another "not in practice" case. Saying "memcpy to {,u}int8_t doesn't propagate the effective type" would probably go to great lengths towards fixing this. > So, I guess the idea is that you'd like to stay in "int" as much as > possible. Yes. Except move to 64-bit as early as possible if it will be necessary to do that. > (And, with respect to the above point, both uint8_t and > uint16_t are promoted to int (=== int32_t), on all platforms that matter.) Yes, but uint8_t arithmetic cannot overflow as easily as uint16_t. int16_t is fine, but not as useful as uint16_t could be. > In comparison, I'm a huge fan of unsigned-only, both in variables / > fields and in constants. :) > > One random example: (a - b). If "a" and "b" are unsigned, then (1) > wrapping is well-defined, (2) if you don't want it Sorry for snipping your derivation (which I did read) but... checking for overflow is not the common case. The common case is that you want to cast "a" and "b" to a 64-bit type. :) And if you already have an int64_t, that is also not the common case: it is not too useful to _store_ int64_t's. uint64_t's are useful because they are size_t's. But ptrdiff_t overflows usually result from multiplication, not from addition or subtractions. I know these are sweeping generalizations, but generalizations are what we use to unload our brains from the nitty-gritty details. > ... Given that we almost never need negative integer values, I'd rather > stick with unsigned variables, unsigned constants, and write (a<b), in > order to check against wrapping, than use the above monstrosity. It's not a panacea, for example for (i = 0; i <= j; i++) can be an infinite loop for unsigned but not for signed (and this, again, has an effect on what optimizations the compiler can do). Since I'm not a precise person, I wouldn't expect that to be a possibly infinite loop. Using "int" makes the compiler's behavior match my intuition more closely. Paolo > Sure, we can always cast to int64_t first... and if we're subtracting > int64_t, we can always cast to Int128 first... :P > > Laszlo >
On Fri, Oct 02, 2015 at 02:07:32PM +0200, Paolo Bonzini wrote: > On 02/10/2015 13:14, Laszlo Ersek wrote: > > On 10/02/15 10:34, Paolo Bonzini wrote: > >> On 01/10/2015 21:17, Laszlo Ersek wrote: > >>> - In the firmware, allocate an array of bytes, dynamically. This array > >>> will have no declared type. > >>> > >>> - Populate the array byte-wise, from fw_cfg. Because the stores happen > >>> through character-typed lvalues, they do not "imbue" the target > >>> object with any effective type, for further accesses that do not > >>> modify the value. (I.e., for further reads.) > >>> > >>> - Get a (uint8_t*) into the array somewhere, and cast it to > >>> (struct acpi_table_hdr *). Read fields through the cast pointer. > >>> Assuming no out-of-bounds situation (considering the entire > >>> pointed to acpi_table_hdr struct), and assuming no alignment > >>> violations for the fields (which is implementation-defined), these > >>> accesses will be fine. > >>> > >>> *However*. If in point 2 you populate the array with uint64_t accesses, > >>> that *does* imbue the array elements with an effective type that is > >>> binding for further read accesses. > >> > >> Then don't do it. Use memcpy from uint64_t to the array. > > > > It won't work; memcpy() propagates the effective type. > > Doh. I guess that's another "not in practice" case. Saying "memcpy to > {,u}int8_t doesn't propagate the effective type" would probably go to > great lengths towards fixing this. Just to be pedantic, uint8_t/int8_t are not the same as 'char' wrt aliasing rules. (The standard defines writes to a char array/pointer as being allowed to alias with other types, but does not say that about int8_t.) Gcc currently treats them as the same; I actually tried to get gcc to change that a few months ago: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110#c13 FWIW, I think the aliasing rules allow for very useful optimizations and I wouldn't want to turn them off for programs where performance is important. The test case in the bug link above (which the gcc developers thankfully did address!) is a good example of the utility of alias detection. This function: void func(struct s2 *p) { p->p1->f2 = 9; p->p1->f2 = 10; } can't be optimized without -fstrict-aliasing. Indeed, even if the code was changed to p->p1->f3 = 11; p->p1->f4 = 12; then gcc would still need to reload p->p1 after every store. That's just silly. -Kevin
diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 2b914b2..6af6db9 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -1491,7 +1491,7 @@ static void report_unavailable_features(FeatureWord w, uint32_t mask) int i; for (i = 0; i < 32; ++i) { - if (1 << i & mask) { + if ((1UL << i) & mask) { const char *reg = get_register_name_32(f->cpuid_reg); assert(reg); fprintf(stderr, "warning: %s doesn't support requested feature: "
Fix undefined behavior detected by clang runtime check: qemu/target-i386/cpu.c:1494:15: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' While doing that, add extra parenthesis for clarity. Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> --- target-i386/cpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)