Patchwork [RFC] More compact (100x) -g3 .debug_macinfo

login
register
mail settings
Submitter Tom Tromey
Date July 13, 2011, 7:36 p.m.
Message ID <m3mxgix9b0.fsf@fleche.redhat.com>
Download mbox | patch
Permalink /patch/104591/
State New
Headers show

Comments

Tom Tromey - July 13, 2011, 7:36 p.m.
>>>>> "Jakub" == Jakub Jelinek <jakub@redhat.com> writes:

Jakub> Currently .debug_macinfo is prohibitively large, because it doesn't
Jakub> allow for any kind of merging of duplicate debug information.

Jakub> This patch is an RFC for extensions that allow it to bring it down
Jakub> to manageable levels.

I wrote a gdb patch for this.  I've appended it in case you want to try
it out; it is against git master.  I tried it a little on an executable
Jakub sent me and it seems to work fine.

It is no trouble to change this patch if you change the format.  It
wasn't hard to write in the first place, it just bigger than it is
because I moved a bunch of code into a new function.

I don't think I really understood DW_MACINFO_GNU_define_opcode, so the
implementation here is probably wrong.

Tom

2011-07-13  Tom Tromey  <tromey@redhat.com>

	* dwarf2read.c (read_indirect_string_at_offset): New function.
	(read_indirect_string): Use it.
	(dwarf_decode_macro_bytes): New function, taken from
	dwarf_decode_macros.  Handle DW_MACINFO_GNU_*.
	(dwarf_decode_macros): Use it.  handle DW_MACINFO_GNU_*.
Jakub Jelinek - July 13, 2011, 8:17 p.m.
On Wed, Jul 13, 2011 at 01:36:03PM -0600, Tom Tromey wrote:
> I wrote a gdb patch for this.  I've appended it in case you want to try
> it out; it is against git master.  I tried it a little on an executable
> Jakub sent me and it seems to work fine.

Thanks.

> It is no trouble to change this patch if you change the format.  It
> wasn't hard to write in the first place, it just bigger than it is
> because I moved a bunch of code into a new function.
> 
> I don't think I really understood DW_MACINFO_GNU_define_opcode, so the
> implementation here is probably wrong.

Well, I think you've skipped it correctly and furthermore even patched
GCC doesn't emit it.  The point of it was to allow skipping unknown
opcodes.  If you implement this opcode fully and say GCC 4.8 adds a new
vendor opcode, the old implementation would be able to silently skip
over such opcodes.
So, the reader implementation could do something like have an array
of 256 pointers, at the start of parsing a particular .debug_macinfo
chunk clear it (or, when the chunk is read because of
DW_MACINFO_GNU_transparent_include4 it would instead make a copy
of the current array and make the copy current), and when you encounter
DW_OP_GNU_define_opcode, you store a pointer to the encoded operands
of that opcode into the table.  And, when you find an unknown opcode
(reach default: case), and array[op] is non-NULL, you read the uleb128
from that location to get the count and iterate over the DW_FORM_* values
in the array and for each of them skip corresponding bytes from the opcode's
operand.  Say .debug_macinfo chunk could start with
DW_MACINFO_GNU_define_opcode, 0xe5, 2, DW_FORM_udata, DW_FORM_block,
DW_MACINFO_define, 0, "A 1",
0xe5, 0x80, 0x7f, 5, 1, 2, 3, 4, 5,
DW_MACINFO_define, 0, "B 1",
0
and you'd be able to grok both defines in it, because you'd understand
that after seeing 0xe5 you need to read one uleb128, another uleb128 and
skip the second number of bytes after it.
The copy of the table would be so that the producer could define_opcode just
in the .debug_macinfo spot referenced from DW_AT_macro_info and wouldn't
have to repeat it in the transparent include chains, if it ensured that the
chains wouldn't be merged without having the define_opcode in all the
referencing .debug_macinfo sections.  And the copy of array allows the
transparent chain to add new opcodes or redefine them, while not affecting
the outer sequence.

	Jakub
Tom Tromey - July 18, 2011, 2:43 p.m.
>>>>> "Jakub" == Jakub Jelinek <jakub@redhat.com> writes:

Tom> I don't think I really understood DW_MACINFO_GNU_define_opcode, so the
Tom> implementation here is probably wrong.

Jakub> Well, I think you've skipped it correctly and furthermore even patched
Jakub> GCC doesn't emit it.  The point of it was to allow skipping unknown
Jakub> opcodes.  If you implement this opcode fully and say GCC 4.8 adds a new
Jakub> vendor opcode, the old implementation would be able to silently skip
Jakub> over such opcodes.

I implemented this part today, so I think the gdb patch is complete now.

Tom

Patch

diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c
index fde5b6a..af35f16 100644
--- a/gdb/dwarf2read.c
+++ b/gdb/dwarf2read.c
@@ -10182,32 +10182,32 @@  read_direct_string (bfd *abfd, gdb_byte *buf, unsigned int *bytes_read_ptr)
 }
 
 static char *
-read_indirect_string (bfd *abfd, gdb_byte *buf,
-		      const struct comp_unit_head *cu_header,
-		      unsigned int *bytes_read_ptr)
+read_indirect_string_at_offset (bfd *abfd, LONGEST str_offset)
 {
-  LONGEST str_offset = read_offset (abfd, buf, cu_header, bytes_read_ptr);
-
   dwarf2_read_section (dwarf2_per_objfile->objfile, &dwarf2_per_objfile->str);
   if (dwarf2_per_objfile->str.buffer == NULL)
-    {
-      error (_("DW_FORM_strp used without .debug_str section [in module %s]"),
-		      bfd_get_filename (abfd));
-      return NULL;
-    }
+    error (_("DW_FORM_strp used without .debug_str section [in module %s]"),
+	   bfd_get_filename (abfd));
   if (str_offset >= dwarf2_per_objfile->str.size)
-    {
-      error (_("DW_FORM_strp pointing outside of "
-	       ".debug_str section [in module %s]"),
-	     bfd_get_filename (abfd));
-      return NULL;
-    }
+    error (_("DW_FORM_strp pointing outside of "
+	     ".debug_str section [in module %s]"),
+	   bfd_get_filename (abfd));
   gdb_assert (HOST_CHAR_BIT == 8);
   if (dwarf2_per_objfile->str.buffer[str_offset] == '\0')
     return NULL;
   return (char *) (dwarf2_per_objfile->str.buffer + str_offset);
 }
 
+static char *
+read_indirect_string (bfd *abfd, gdb_byte *buf,
+		      const struct comp_unit_head *cu_header,
+		      unsigned int *bytes_read_ptr)
+{
+  LONGEST str_offset = read_offset (abfd, buf, cu_header, bytes_read_ptr);
+
+  return read_indirect_string_at_offset (abfd, str_offset);
+}
+
 static unsigned long
 read_unsigned_leb128 (bfd *abfd, gdb_byte *buf, unsigned int *bytes_read_ptr)
 {
@@ -14576,116 +14576,14 @@  parse_macro_definition (struct macro_source_file *file, int line,
 
 
 static void
-dwarf_decode_macros (struct line_header *lh, unsigned int offset,
-                     char *comp_dir, bfd *abfd,
-                     struct dwarf2_cu *cu)
+dwarf_decode_macro_bytes (bfd *abfd, gdb_byte *mac_ptr, gdb_byte *mac_end,
+			  struct macro_source_file *current_file,
+			  struct line_header *lh, char *comp_dir,
+			  struct dwarf2_cu *cu)
 {
-  gdb_byte *mac_ptr, *mac_end;
-  struct macro_source_file *current_file = 0;
   enum dwarf_macinfo_record_type macinfo_type;
   int at_commandline;
 
-  dwarf2_read_section (dwarf2_per_objfile->objfile,
-		       &dwarf2_per_objfile->macinfo);
-  if (dwarf2_per_objfile->macinfo.buffer == NULL)
-    {
-      complaint (&symfile_complaints, _("missing .debug_macinfo section"));
-      return;
-    }
-
-  /* First pass: Find the name of the base filename.
-     This filename is needed in order to process all macros whose definition
-     (or undefinition) comes from the command line.  These macros are defined
-     before the first DW_MACINFO_start_file entry, and yet still need to be
-     associated to the base file.
-
-     To determine the base file name, we scan the macro definitions until we
-     reach the first DW_MACINFO_start_file entry.  We then initialize
-     CURRENT_FILE accordingly so that any macro definition found before the
-     first DW_MACINFO_start_file can still be associated to the base file.  */
-
-  mac_ptr = dwarf2_per_objfile->macinfo.buffer + offset;
-  mac_end = dwarf2_per_objfile->macinfo.buffer
-    + dwarf2_per_objfile->macinfo.size;
-
-  do
-    {
-      /* Do we at least have room for a macinfo type byte?  */
-      if (mac_ptr >= mac_end)
-        {
-	  /* Complaint is printed during the second pass as GDB will probably
-	     stop the first pass earlier upon finding
-	     DW_MACINFO_start_file.  */
-	  break;
-        }
-
-      macinfo_type = read_1_byte (abfd, mac_ptr);
-      mac_ptr++;
-
-      switch (macinfo_type)
-        {
-          /* A zero macinfo type indicates the end of the macro
-             information.  */
-        case 0:
-	  break;
-
-	case DW_MACINFO_define:
-	case DW_MACINFO_undef:
-	  /* Only skip the data by MAC_PTR.  */
-	  {
-	    unsigned int bytes_read;
-
-	    read_unsigned_leb128 (abfd, mac_ptr, &bytes_read);
-	    mac_ptr += bytes_read;
-	    read_direct_string (abfd, mac_ptr, &bytes_read);
-	    mac_ptr += bytes_read;
-	  }
-	  break;
-
-	case DW_MACINFO_start_file:
-	  {
-	    unsigned int bytes_read;
-	    int line, file;
-
-	    line = read_unsigned_leb128 (abfd, mac_ptr, &bytes_read);
-	    mac_ptr += bytes_read;
-	    file = read_unsigned_leb128 (abfd, mac_ptr, &bytes_read);
-	    mac_ptr += bytes_read;
-
-	    current_file = macro_start_file (file, line, current_file,
-					     comp_dir, lh, cu->objfile);
-	  }
-	  break;
-
-	case DW_MACINFO_end_file:
-	  /* No data to skip by MAC_PTR.  */
-	  break;
-
-	case DW_MACINFO_vendor_ext:
-	  /* Only skip the data by MAC_PTR.  */
-	  {
-	    unsigned int bytes_read;
-
-	    read_unsigned_leb128 (abfd, mac_ptr, &bytes_read);
-	    mac_ptr += bytes_read;
-	    read_direct_string (abfd, mac_ptr, &bytes_read);
-	    mac_ptr += bytes_read;
-	  }
-	  break;
-
-	default:
-	  break;
-	}
-    } while (macinfo_type != 0 && current_file == NULL);
-
-  /* Second pass: Process all entries.
-
-     Use the AT_COMMAND_LINE flag to determine whether we are still processing
-     command-line macro definitions/undefinitions.  This flag is unset when we
-     reach the first DW_MACINFO_start_file entry.  */
-
-  mac_ptr = dwarf2_per_objfile->macinfo.buffer + offset;
-
   /* Determines if GDB is still before first DW_MACINFO_start_file.  If true
      GDB is still reading the definitions from command line.  First
      DW_MACINFO_start_file will need to be ignored as it was already executed
@@ -14716,27 +14614,43 @@  dwarf_decode_macros (struct line_header *lh, unsigned int offset,
 
         case DW_MACINFO_define:
         case DW_MACINFO_undef:
+	case DW_MACINFO_GNU_define_indirect4:
+	case DW_MACINFO_GNU_undef_indirect4:
           {
             unsigned int bytes_read;
             int line;
             char *body;
+	    int is_define;
 
-            line = read_unsigned_leb128 (abfd, mac_ptr, &bytes_read);
-            mac_ptr += bytes_read;
-            body = read_direct_string (abfd, mac_ptr, &bytes_read);
-            mac_ptr += bytes_read;
+	    line = read_unsigned_leb128 (abfd, mac_ptr, &bytes_read);
+	    mac_ptr += bytes_read;
+
+	    if (macinfo_type == DW_MACINFO_define
+		|| macinfo_type == DW_MACINFO_undef)
+	      {
+		body = read_direct_string (abfd, mac_ptr, &bytes_read);
+		mac_ptr += bytes_read;
+	      }
+	    else
+	      {
+		LONGEST str_offset;
+
+		str_offset = read_offset_1 (abfd, mac_ptr, 4);
+		mac_ptr += 4;
+
+		body = read_indirect_string_at_offset (abfd, str_offset);
+	      }
 
+	    is_define = (macinfo_type == DW_MACINFO_define
+			 || macinfo_type == DW_MACINFO_GNU_define_indirect4);
             if (! current_file)
 	      {
 		/* DWARF violation as no main source is present.  */
 		complaint (&symfile_complaints,
 			   _("debug info with no main source gives macro %s "
 			     "on line %d: %s"),
-			   macinfo_type == DW_MACINFO_define ?
-			     _("definition") :
-			       macinfo_type == DW_MACINFO_undef ?
-				 _("undefinition") :
-				 _("something-or-other"), line, body);
+			   is_define ? _("definition") : _("undefinition"),
+			   line, body);
 		break;
 	      }
 	    if ((line == 0 && !at_commandline)
@@ -14744,17 +14658,17 @@  dwarf_decode_macros (struct line_header *lh, unsigned int offset,
 	      complaint (&symfile_complaints,
 			 _("debug info gives %s macro %s with %s line %d: %s"),
 			 at_commandline ? _("command-line") : _("in-file"),
-			 macinfo_type == DW_MACINFO_define ?
-			   _("definition") :
-			     macinfo_type == DW_MACINFO_undef ?
-			       _("undefinition") :
-			       _("something-or-other"),
+			 is_define ? _("definition") : _("undefinition"),
 			 line == 0 ? _("zero") : _("non-zero"), line, body);
 
-	    if (macinfo_type == DW_MACINFO_define)
+	    if (is_define)
 	      parse_macro_definition (current_file, line, body);
-	    else if (macinfo_type == DW_MACINFO_undef)
-	      macro_undef (current_file, line, body);
+	    else
+	      {
+		gdb_assert (macinfo_type == DW_MACINFO_undef
+			    || macinfo_type == DW_MACINFO_GNU_undef_indirect4);
+		macro_undef (current_file, line, body);
+	      }
           }
           break;
 
@@ -14825,6 +14739,33 @@  dwarf_decode_macros (struct line_header *lh, unsigned int offset,
             }
           break;
 
+	case DW_MACINFO_GNU_transparent_include4:
+	  {
+	    LONGEST offset;
+
+	    offset = read_offset_1 (abfd, mac_ptr, 4);
+	    mac_ptr += 4;
+
+	    dwarf_decode_macro_bytes (abfd,
+				      (dwarf2_per_objfile->macinfo.buffer
+				       + offset),
+				      mac_end, current_file,
+				      lh, comp_dir, cu);
+	  }
+	  break;
+
+	case DW_MACINFO_GNU_define_opcode:
+	  {
+	    unsigned int bytes_read, arg;
+
+	    /* Just ignore it.  */
+	    mac_ptr += 1;
+	    arg = read_unsigned_leb128 (abfd, mac_ptr, &bytes_read);
+	    mac_ptr += bytes_read;
+	    mac_ptr += arg;
+	  }
+	  break;
+
         case DW_MACINFO_vendor_ext:
           {
             unsigned int bytes_read;
@@ -14842,6 +14783,149 @@  dwarf_decode_macros (struct line_header *lh, unsigned int offset,
     } while (macinfo_type != 0);
 }
 
+static void
+dwarf_decode_macros (struct line_header *lh, unsigned int offset,
+                     char *comp_dir, bfd *abfd,
+                     struct dwarf2_cu *cu)
+{
+  gdb_byte *mac_ptr, *mac_end;
+  struct macro_source_file *current_file = 0;
+  enum dwarf_macinfo_record_type macinfo_type;
+
+  dwarf2_read_section (dwarf2_per_objfile->objfile,
+		       &dwarf2_per_objfile->macinfo);
+  if (dwarf2_per_objfile->macinfo.buffer == NULL)
+    {
+      complaint (&symfile_complaints, _("missing .debug_macinfo section"));
+      return;
+    }
+
+  /* First pass: Find the name of the base filename.
+     This filename is needed in order to process all macros whose definition
+     (or undefinition) comes from the command line.  These macros are defined
+     before the first DW_MACINFO_start_file entry, and yet still need to be
+     associated to the base file.
+
+     To determine the base file name, we scan the macro definitions until we
+     reach the first DW_MACINFO_start_file entry.  We then initialize
+     CURRENT_FILE accordingly so that any macro definition found before the
+     first DW_MACINFO_start_file can still be associated to the base file.  */
+
+  mac_ptr = dwarf2_per_objfile->macinfo.buffer + offset;
+  mac_end = dwarf2_per_objfile->macinfo.buffer
+    + dwarf2_per_objfile->macinfo.size;
+
+  do
+    {
+      /* Do we at least have room for a macinfo type byte?  */
+      if (mac_ptr >= mac_end)
+        {
+	  /* Complaint is printed during the second pass as GDB will probably
+	     stop the first pass earlier upon finding
+	     DW_MACINFO_start_file.  */
+	  break;
+        }
+
+      macinfo_type = read_1_byte (abfd, mac_ptr);
+      mac_ptr++;
+
+      switch (macinfo_type)
+        {
+          /* A zero macinfo type indicates the end of the macro
+             information.  */
+        case 0:
+	  break;
+
+	case DW_MACINFO_define:
+	case DW_MACINFO_undef:
+	  /* Only skip the data by MAC_PTR.  */
+	  {
+	    unsigned int bytes_read;
+
+	    read_unsigned_leb128 (abfd, mac_ptr, &bytes_read);
+	    mac_ptr += bytes_read;
+	    read_direct_string (abfd, mac_ptr, &bytes_read);
+	    mac_ptr += bytes_read;
+	  }
+	  break;
+
+	case DW_MACINFO_start_file:
+	  {
+	    unsigned int bytes_read;
+	    int line, file;
+
+	    line = read_unsigned_leb128 (abfd, mac_ptr, &bytes_read);
+	    mac_ptr += bytes_read;
+	    file = read_unsigned_leb128 (abfd, mac_ptr, &bytes_read);
+	    mac_ptr += bytes_read;
+
+	    current_file = macro_start_file (file, line, current_file,
+					     comp_dir, lh, cu->objfile);
+	  }
+	  break;
+
+	case DW_MACINFO_end_file:
+	  /* No data to skip by MAC_PTR.  */
+	  break;
+
+	case DW_MACINFO_vendor_ext:
+	  /* Only skip the data by MAC_PTR.  */
+	  {
+	    unsigned int bytes_read;
+
+	    read_unsigned_leb128 (abfd, mac_ptr, &bytes_read);
+	    mac_ptr += bytes_read;
+	    read_direct_string (abfd, mac_ptr, &bytes_read);
+	    mac_ptr += bytes_read;
+	  }
+	  break;
+
+	case DW_MACINFO_GNU_define_indirect4:
+	case DW_MACINFO_GNU_undef_indirect4:
+	  {
+	    unsigned int bytes_read;
+
+	    read_unsigned_leb128 (abfd, mac_ptr, &bytes_read);
+	    mac_ptr += bytes_read;
+	    mac_ptr += 4;
+	  }
+	  break;
+
+	case DW_MACINFO_GNU_transparent_include4:
+	  /* Note that, according to the spec, a transparent include
+	     chain cannot call DW_MACINFO_start_file.  So, we can just
+	     skip this opcode.  */
+	  mac_ptr += 4;
+	  break;
+
+	case DW_MACINFO_GNU_define_opcode:
+	  {
+	    unsigned int bytes_read, arg;
+
+	    mac_ptr += 1;
+	    arg = read_unsigned_leb128 (abfd, mac_ptr, &bytes_read);
+	    mac_ptr += bytes_read;
+	    mac_ptr += arg;
+	  }
+	  break;
+
+	default:
+	  break;
+	}
+    } while (macinfo_type != 0 && current_file == NULL);
+
+  /* Second pass: Process all entries.
+
+     Use the AT_COMMAND_LINE flag to determine whether we are still processing
+     command-line macro definitions/undefinitions.  This flag is unset when we
+     reach the first DW_MACINFO_start_file entry.  */
+
+  mac_ptr = dwarf2_per_objfile->macinfo.buffer + offset;
+
+  dwarf_decode_macro_bytes (abfd, mac_ptr, mac_end, current_file,
+			    lh, comp_dir, cu);
+}
+
 /* Check if the attribute's form is a DW_FORM_block*
    if so return true else false.  */
 static int
diff --git a/include/dwarf2.h b/include/dwarf2.h
index b2806ef..40a8a66 100644
--- a/include/dwarf2.h
+++ b/include/dwarf2.h
@@ -877,7 +877,13 @@  enum dwarf_macinfo_record_type
     DW_MACINFO_undef = 2,
     DW_MACINFO_start_file = 3,
     DW_MACINFO_end_file = 4,
-    DW_MACINFO_vendor_ext = 255
+    DW_MACINFO_lo_user = 0xe0,
+    DW_MACINFO_GNU_define_indirect4 = 0xe0,
+    DW_MACINFO_GNU_undef_indirect4 = 0xe1,
+    DW_MACINFO_GNU_transparent_include4 = 0xe2,
+    DW_MACINFO_GNU_define_opcode = 0xe3,
+    DW_MACINFO_hi_user = 0xfe,
+    DW_MACINFO_vendor_ext = 0xff
   };
 
 /* @@@ For use with GNU frame unwind information.  */