diff mbox

DWARF: make signedness explicit for enumerator const values

Message ID 20161013161201.10521-1-derodat@adacore.com
State New
Headers show

Commit Message

Pierre-Marie de Rodat Oct. 13, 2016, 4:12 p.m. UTC
Hello,

Currently, the DWARF description does not specify the signedness of the
representation of enumeration types.  This is a problem in some
contexts where DWARF consumers need to determine if value X is greater
than value Y.

For instance in Ada:

    type Enum_Type is ( A, B, C, D);
    for Enum_Type use (-1, 0, 1, 2);

    type Rec_Type (E : Enum_Type) is record
       when A .. B => null;
       when others => B : Booleann;
    end record;

The above can be described in DWARF the following way:

    DW_TAG_enumeration_type(Enum_Type)
    | DW_AT_byte_size: 1
      DW_TAG_enumerator(A)
      | DW_AT_const_value: -1
      DW_TAG_enumerator(B)
      | DW_AT_const_value: 0
      DW_TAG_enumerator(C)
      | DW_AT_const_value: 1
      DW_TAG_enumerator(D)
      | DW_AT_const_value: 2

    DW_TAG_structure_type(Rec_Type)
      DW_TAG_member(E)
      | DW_AT_type: <Enum_Type>
      DW_TAG_variant_part
      | DW_AT_discr: <E>
        DW_TAG_variant
        | DW_AT_discr_list: DW_DSC_range 0x7f 0
        DW_TAG_variant
        | DW_TAG_member(b)

DWARF consumers need to know that enumerators (A, B, C and D) are signed
in order to determine the set of E values for which Rec_Type has a B
field.  In practice, they need to know how to interpret the 0x7f LEB128
number above (-1, not 127).

There seems to be only two alternatives to solve this issue: one is to
add a DW_AT_type attribute to DW_TAG_enumerator_type DIEs to make it
point to a base type that specifies the signedness.  The other is to
make sure the form of the DW_AT_const_value attribute carries the
signedness information.  This patch implements the latter.

Currently, most of these attributes are generated with DW_FORM_data*
forms (dw_val_class_unsigned_const).  This patch changes the enumerator
description generation to always use instead the DW_FORM_[us]data forms.
It does so adding a new dw_val_class ("explicit unsigned const"), using
it for unsigned enumerators and using "[signed] const" for the signed
ones.

Bootstrapped and regtested (GCC+GDB testsuites) sucessfully on
x86_64-linux.  I also checked that the new testcase fails with current
trunk.  Ok to commit?

Thank you in advance!

gcc/

	* dwarf2out.h (enum dw_val_class): Add a
	dw_val_class_explicit_unsigned_const class.
	(struct dw_val_node): Add a val_explicit_unsigned variant.
	* dwarf2out.c (dw_val_equal_p, print_dw_val, attr_checksum,
	attr_checksum_ordered, same_dw_val_p, size_of_die, value_format,
	output_die): Handle dw_val_class_explicit_unsigned_const.
	(add_AT_explicit_unsigned, AT_explicit_unsigned): New functions.
	(gen_enumeration_type_die): Use the explicit unsigned const form
	for all unsigned enumerator values and use the explicit [signed]
	const form for all signed ones.

gcc/testsuite/

	* gnat.dg/debug10.adb: New testcase.
---
 gcc/dwarf2out.c                   | 61 ++++++++++++++++++++++++++++++++++-----
 gcc/dwarf2out.h                   |  3 ++
 gcc/testsuite/gnat.dg/debug10.adb | 39 +++++++++++++++++++++++++
 3 files changed, 95 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gnat.dg/debug10.adb

Comments

Pierre-Marie de Rodat Nov. 8, 2016, 2:26 p.m. UTC | #1
Hello,

Ping for the patch submitted there: 
<https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01060.html>.

Thank you in advance!
Mark Wielaard Nov. 10, 2016, 12:38 p.m. UTC | #2
On Thu, 2016-10-13 at 18:12 +0200, Pierre-Marie de Rodat wrote:
> Currently, the DWARF description does not specify the signedness of the
> representation of enumeration types.  This is a problem in some
> contexts where DWARF consumers need to determine if value X is greater
> than value Y.
> 
> For instance in Ada:
> 
>     type Enum_Type is ( A, B, C, D);
>     for Enum_Type use (-1, 0, 1, 2);
> 
>     type Rec_Type (E : Enum_Type) is record
>        when A .. B => null;
>        when others => B : Booleann;
>     end record;
> 
> The above can be described in DWARF the following way:
> 
>     DW_TAG_enumeration_type(Enum_Type)
>     | DW_AT_byte_size: 1
>       DW_TAG_enumerator(A)
>       | DW_AT_const_value: -1
>       DW_TAG_enumerator(B)
>       | DW_AT_const_value: 0
>       DW_TAG_enumerator(C)
>       | DW_AT_const_value: 1
>       DW_TAG_enumerator(D)
>       | DW_AT_const_value: 2
> 
>     DW_TAG_structure_type(Rec_Type)
>       DW_TAG_member(E)
>       | DW_AT_type: <Enum_Type>
>       DW_TAG_variant_part
>       | DW_AT_discr: <E>
>         DW_TAG_variant
>         | DW_AT_discr_list: DW_DSC_range 0x7f 0
>         DW_TAG_variant
>         | DW_TAG_member(b)
> 
> DWARF consumers need to know that enumerators (A, B, C and D) are signed
> in order to determine the set of E values for which Rec_Type has a B
> field.  In practice, they need to know how to interpret the 0x7f LEB128
> number above (-1, not 127).
> 
> There seems to be only two alternatives to solve this issue: one is to
> add a DW_AT_type attribute to DW_TAG_enumerator_type DIEs to make it
> point to a base type that specifies the signedness.  The other is to
> make sure the form of the DW_AT_const_value attribute carries the
> signedness information.  This patch implements the latter.

IMHO having an explicit DW_AT_type pointing at the base type with size
and encoding for the DW_TAG_enumerator_type is better for consumers than
having to try and interpret the DW_FORM used to encode the values.

Alternatively could we just attach a DW_AT_encoding to the
DW_TAG_enumeration_type? The spec doesn't list it as one of the
attributes for an enumeration_type, but it makes sense given it already
carries bit/byte size attributes.

Thanks,

Mark
Pierre-Marie de Rodat Nov. 14, 2016, 11:08 a.m. UTC | #3
Mark,

Thank you for your answer!

On 11/10/2016 01:38 PM, Mark Wielaard wrote:
> IMHO having an explicit DW_AT_type pointing at the base type with size
> and encoding for the DW_TAG_enumerator_type is better for consumers than
> having to try and interpret the DW_FORM used to encode the values.

I’m curious about why you think this alternative is better for 
consumers: after all, they always have to interpret the DW_FORM anyway 
in order to decode the DIE stream. At least this goes against the DWARF 
standard’s “strong” recommendation: section 7.5.4 Attribute Encodings says:

     If one of the DW_FORM_data<n> forms is used to represent a signed
     or unsigned integer, it can be hard for a consumer to discover the
     context necessary to determine which interpretation is intended.
     Producers are therefore strongly encouraged to use DW_FORM_sdata or
     DW_FORM_udata for signed and unsigned integers respectively, rather
     than DW_FORM_data<n>.

> Alternatively could we just attach a DW_AT_encoding to the
> DW_TAG_enumeration_type? The spec doesn't list it as one of the
> attributes for an enumeration_type, but it makes sense given it already
> carries bit/byte size attributes.

I agree it would make sense, but would it be acceptable to enable this 
even in strict mode? If not, I’d prefer to stick to a solution that 
would apply everywhere. :-)
Mark Wielaard Nov. 14, 2016, 12:05 p.m. UTC | #4
On Mon, 2016-11-14 at 12:08 +0100, Pierre-Marie de Rodat wrote:
> Thank you for your answer!
> 
> On 11/10/2016 01:38 PM, Mark Wielaard wrote:
> > IMHO having an explicit DW_AT_type pointing at the base type with size
> > and encoding for the DW_TAG_enumerator_type is better for consumers than
> > having to try and interpret the DW_FORM used to encode the values.
> 
> I’m curious about why you think this alternative is better for 
> consumers: after all, they always have to interpret the DW_FORM anyway 
> in order to decode the DIE stream.

Right, this comes from having these untyped (and unsized) forms in the
constant class. And the constant class being used with and without
context about how to interpret the constant (is it a bit pattern or a
value). In this particular case the value is represented through a
DW_AT_const_value attribute which can also be represented in block form.
Without (type) context you also don't know anything about the size. You
can either choose a signed/unsigned form not giving the consumer a hint
about the size of the underlying constant value or one of the sized data
forms that don't give a hint about the value representation/signedness.
So in both cases the consumer looses without an actual (base) type
telling them how to interpret the constant.

If the type/context is known then for a consumer it is easy to just have
a read signed/unsigned value method for attributes that provide a
constant/value which doesn't care about the underlying form. That also
means the producer can choose the smallest representation of the data
without worrying that the consumer might misinterpret the value
representation by the specific form chosen.

Cheers,

Mark
Pierre-Marie de Rodat Nov. 18, 2016, 5:22 p.m. UTC | #5
On 11/14/2016 01:05 PM, Mark Wielaard wrote:
> You can either choose a signed/unsigned form not giving the consumer
> a hint about the size of the underlying constant value or one of the
> sized data forms that don't give a hint about the value
> representation/signedness. So in both cases the consumer looses
> without an actual (base) type telling them how to interpret the
> constant.

I see, thank you for the explanation! :-)

I agree with you, so I’ll revise to instead add a DW_AT_type attribute 
to enumeration DIEs to point to a base type.
diff mbox

Patch

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index b5787ef..7022e6c 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -1361,6 +1361,7 @@  dw_val_equal_p (dw_val_node *a, dw_val_node *b)
 
     case dw_val_class_offset:
     case dw_val_class_unsigned_const:
+    case dw_val_class_explicit_unsigned_const:
     case dw_val_class_const:
     case dw_val_class_range_list:
     case dw_val_class_lineptr:
@@ -3947,6 +3948,29 @@  AT_unsigned (dw_attr_node *a)
   return a->dw_attr_val.v.val_unsigned;
 }
 
+/* Add an explicitely unsigned integer attribute value to a DIE.  */
+
+static inline void
+add_AT_explicit_unsigned (dw_die_ref die, enum dwarf_attribute attr_kind,
+			  unsigned HOST_WIDE_INT unsigned_val)
+{
+  dw_attr_node attr;
+
+  attr.dw_attr = attr_kind;
+  attr.dw_attr_val.val_class = dw_val_class_explicit_unsigned_const;
+  attr.dw_attr_val.val_entry = NULL;
+  attr.dw_attr_val.v.val_explicit_unsigned = unsigned_val;
+  add_dwarf_attr (die, &attr);
+}
+
+static inline unsigned HOST_WIDE_INT
+AT_explicit_unsigned (dw_attr_node *a)
+{
+  gcc_assert (a != NULL
+	      && AT_class (a) == dw_val_class_explicit_unsigned_const);
+  return a->dw_attr_val.v.val_explicit_unsigned;
+}
+
 /* Add an unsigned wide integer attribute value to a DIE.  */
 
 static inline void
@@ -5600,6 +5624,7 @@  print_dw_val (dw_val_node *val, bool recurse, FILE *outfile)
       fprintf (outfile, HOST_WIDE_INT_PRINT_DEC, val->v.val_int);
       break;
     case dw_val_class_unsigned_const:
+    case dw_val_class_explicit_unsigned_const:
       fprintf (outfile, HOST_WIDE_INT_PRINT_UNSIGNED, val->v.val_unsigned);
       break;
     case dw_val_class_const_double:
@@ -5998,6 +6023,7 @@  attr_checksum (dw_attr_node *at, struct md5_ctx *ctx, int *mark)
       CHECKSUM (at->dw_attr_val.v.val_int);
       break;
     case dw_val_class_unsigned_const:
+    case dw_val_class_explicit_unsigned_const:
       CHECKSUM (at->dw_attr_val.v.val_unsigned);
       break;
     case dw_val_class_const_double:
@@ -6277,6 +6303,7 @@  attr_checksum_ordered (enum dwarf_tag tag, dw_attr_node *at,
       break;
 
     case dw_val_class_unsigned_const:
+    case dw_val_class_explicit_unsigned_const:
       CHECKSUM_ULEB128 (DW_FORM_sdata);
       CHECKSUM_SLEB128 ((int) at->dw_attr_val.v.val_unsigned);
       break;
@@ -6784,6 +6811,7 @@  same_dw_val_p (const dw_val_node *v1, const dw_val_node *v2, int *mark)
     case dw_val_class_const:
       return v1->v.val_int == v2->v.val_int;
     case dw_val_class_unsigned_const:
+    case dw_val_class_explicit_unsigned_const:
       return v1->v.val_unsigned == v2->v.val_unsigned;
     case dw_val_class_const_double:
       return v1->v.val_double.high == v2->v.val_double.high
@@ -8434,6 +8462,9 @@  size_of_die (dw_die_ref die)
 	      size += csize;
 	  }
 	  break;
+	case dw_val_class_explicit_unsigned_const:
+	  size += size_of_uleb128 (AT_explicit_unsigned (a));
+	  break;
 	case dw_val_class_const_double:
 	  size += HOST_BITS_PER_DOUBLE_INT / HOST_BITS_PER_CHAR;
 	  if (HOST_BITS_PER_WIDE_INT >= 64)
@@ -8814,6 +8845,8 @@  value_format (dw_attr_node *a)
 	default:
 	  gcc_unreachable ();
 	}
+    case dw_val_class_explicit_unsigned_const:
+      return DW_FORM_udata;
     case dw_val_class_const_double:
       switch (HOST_BITS_PER_WIDE_INT)
 	{
@@ -9280,6 +9313,10 @@  output_die (dw_die_ref die)
 	  }
 	  break;
 
+	case dw_val_class_explicit_unsigned_const:
+	  dw2_asm_output_data_uleb128 (AT_explicit_unsigned (a), "%s", name);
+	  break;
+
 	case dw_val_class_const_double:
 	  {
 	    unsigned HOST_WIDE_INT first, second;
@@ -19735,15 +19772,23 @@  gen_enumeration_type_die (tree type, dw_die_ref context_die)
 	  if (simple_type_size_in_bits (TREE_TYPE (value))
 	      <= HOST_BITS_PER_WIDE_INT || tree_fits_shwi_p (value))
 	    {
-	      /* For constant forms created by add_AT_unsigned DWARF
-		 consumers (GDB, elfutils, etc.) always zero extend
-		 the value.  Only when the actual value is negative
-		 do we need to use add_AT_int to generate a constant
-		 form that can represent negative values.  */
+	      /* DW_TAG_enumeration_type DIEs do not describe type signedness.
+		 However, this information is required when enumeration
+		 subranges are considered: this happens for instance in
+		 DW_TAG_subrange_type DIEs or in DW_DSC_range discriminant
+		 descriptions.  Because of this, unsigned values must be
+		 explicitely described as unsigned.
+
+		 This satisfies with the DWARF recommandation (section 7.5.4
+		 Attribute Encodings):
+
+		   Producers are therefore strongly encouraged to use
+		   DW_FORM_sdata or DW_FORM_udata for signed and unsigned
+		   integers respectively, rather than DW_FORM_data<n>.  */
 	      HOST_WIDE_INT val = TREE_INT_CST_LOW (value);
-	      if (TYPE_UNSIGNED (TREE_TYPE (value)) || val >= 0)
-		add_AT_unsigned (enum_die, DW_AT_const_value,
-				 (unsigned HOST_WIDE_INT) val);
+	      if (TYPE_UNSIGNED (TREE_TYPE (value)))
+		add_AT_explicit_unsigned (enum_die, DW_AT_const_value,
+					  (unsigned HOST_WIDE_INT) val);
 	      else
 		add_AT_int (enum_die, DW_AT_const_value, val);
 	    }
diff --git a/gcc/dwarf2out.h b/gcc/dwarf2out.h
index abf0550..d4e65df 100644
--- a/gcc/dwarf2out.h
+++ b/gcc/dwarf2out.h
@@ -137,6 +137,7 @@  enum dw_val_class
   dw_val_class_range_list,
   dw_val_class_const,
   dw_val_class_unsigned_const,
+  dw_val_class_explicit_unsigned_const,
   dw_val_class_const_double,
   dw_val_class_wide_int,
   dw_val_class_vec,
@@ -199,6 +200,8 @@  struct GTY(()) dw_val_node {
       dw_loc_descr_ref GTY ((tag ("dw_val_class_loc"))) val_loc;
       HOST_WIDE_INT GTY ((default)) val_int;
       unsigned HOST_WIDE_INT GTY ((tag ("dw_val_class_unsigned_const"))) val_unsigned;
+      unsigned HOST_WIDE_INT GTY ((tag ("dw_val_class_explicit_unsigned_const")))
+	val_explicit_unsigned;
       double_int GTY ((tag ("dw_val_class_const_double"))) val_double;
       wide_int_ptr GTY ((tag ("dw_val_class_wide_int"))) val_wide;
       dw_vec_const GTY ((tag ("dw_val_class_vec"))) val_vec;
diff --git a/gcc/testsuite/gnat.dg/debug10.adb b/gcc/testsuite/gnat.dg/debug10.adb
new file mode 100644
index 0000000..ef66889
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/debug10.adb
@@ -0,0 +1,39 @@ 
+--  { dg-options "-g -cargs -dA -margs" }
+--
+--  First, check that there are exactly:
+--    * two abbreviations for DW_TAG_enumerator
+--    * two DW_AT_const_value attributes in abbreviations
+--
+--    { dg-final { scan-assembler-times "\\(TAG: DW_TAG_enumerator\\)" 2 } }
+--    { dg-final { scan-assembler-times "\\(DW_AT_const_value\\)" 2 } }
+--
+--  Then, check that the const values are respectively explicitly unsigned
+--  (udata) and signed.  The following pattern is kind of weak because it does
+--  not check that 1) DW_AT_const_value attributes are indeed related to
+--  DW_TAG_enumerator DIEs and that 2) the DW_FORM_* are related to the
+--  DW_AT_const_value attributes... but we're doing as much as we can with
+--  regexp matching...
+--
+--    { dg-final { scan-assembler "\\(TAG: DW_TAG_enumerator\\).*\\(DW_AT_const_value\\).*\\(DW_FORM_udata\\).*\\(TAG: DW_TAG_enumerator\\).*\\(DW_AT_const_value\\).*\\(DW_FORM_sdata\\)" } }
+
+procedure Debug10 is
+
+   type Default_Type is (A, B, C);
+
+   type Signed_Type is (X, Y, Z);
+   for Signed_Type use (-1, 0, 1);
+
+   procedure Ignore (E : Default_Type) is
+   begin
+      null;
+   end Ignore;
+
+   procedure Ignore (E : Signed_Type) is
+   begin
+      null;
+   end Ignore;
+
+begin
+   Ignore (A);
+   Ignore (X);
+end Debug10;