Message ID | 3a7946e9-d178-f878-9774-64ff44bcf5df@redhat.com |
---|---|
State | New |
Headers | show |
On 06/29/2017 05:53 PM, Pedro Alves wrote: > however, we'd need to still make "whatis" strip one level of > typedefs when the argument is a type, somehow, because we now > get this: > > (gdb) whatis zzz > type = zzz I suspect we can do that by making "whatis" look at the top of the expression tree, see if it's an OP_TYPE: (gdb) set debug expression 1 (gdb) whatis (zzz)0 ... Dump of expression @ 0x1bc6af0, after conversion to prefix form: Expression: `(zzz) 0' Language c, 8 elements, 16 bytes each. 0 UNOP_CAST_TYPE ( 1 OP_TYPE Type @0x18e7920 (zzz)) 4 OP_LONG Type @0x19a6650 (int), value 0 (0x0) type = zzz (gdb) whatis zzz ... Dump of expression @ 0x1bc87b0, after conversion to prefix form: Expression: `zzz' Language c, 3 elements, 16 bytes each. 0 OP_TYPE Type @0x18e7920 (zzz) type = zzz (gdb) whatis z ... Dump of expression @ 0x1bc6af0, after conversion to prefix form: Expression: `z' Language c, 4 elements, 16 bytes each. 0 OP_VAR_VALUE Block @0x19e8ee0, symbol @0x19e8e10 (z) type = zzz (gdb) There may be a helper for this already somewhere. I've never dug that much deeply into this area of the code. Thanks, Pedro Alves
On 06/29/2017 06:02 PM, Pedro Alves wrote: > I suspect we can do that by making "whatis" look at the top of > the expression tree, see if it's an OP_TYPE: That works: With this code: typedef int zzz; zzz z; gdb: (gdb) whatis z type = zzz (gdb) whatis (zzz)0 type = zzz (gdb) whatis zzz type = int and at least make check TESTS="*/whatis*.exp */*ptype*.exp" passes, which is promising. Haven't tried this against a printer, but I think it should start working. We may need OP_SCOPE too. I'll try running this against the full gdb testsuite, and write some test. Thanks, Pedro Alves
On Thu, Jun 29, 2017 at 1:28 PM, Pedro Alves <palves@redhat.com> wrote: > With this code: > > typedef int zzz; > zzz z; > > gdb: > > (gdb) whatis z > type = zzz > (gdb) whatis (zzz)0 > type = zzz > (gdb) whatis zzz > type = int For the glibc patch to go through, this case also needs to work: typedef int error_t; error_t var; extern error_t *errloc(void); int main(void) { return *errloc(); } (compiled to .o file) (gdb) ptype errloc No symbol "errloc" in current context. This might be a problem with inadequate debugging information generated by the compiler -- it does work correctly if a function *definition* is visible. But we need it to work given only the above, possibly with no debug symbols in the shared library defining "errloc". zw
On 06/30/2017 01:28 AM, Zack Weinberg wrote: > On Thu, Jun 29, 2017 at 1:28 PM, Pedro Alves <palves@redhat.com> wrote: >> With this code: >> >> typedef int zzz; >> zzz z; >> >> gdb: >> >> (gdb) whatis z >> type = zzz >> (gdb) whatis (zzz)0 >> type = zzz >> (gdb) whatis zzz >> type = int > I've now sent the patch to gdb-patches: https://sourceware.org/ml/gdb-patches/2017-06/msg00827.html Could one of you check whether the type printer kicks in with that, in the problematic cast case? Phil, might it be a good idea to convert what you were using to a real gdb testsuite test? > For the glibc patch to go through, this case also needs to work: > > typedef int error_t; > error_t var; > extern error_t *errloc(void); > > int main(void) > { > return *errloc(); > } > > (compiled to .o file) > > (gdb) ptype errloc > No symbol "errloc" in current context. > > This might be a problem with inadequate debugging information > generated by the compiler Debug info is generated for function definitions, not declarations. I.e., that declaration for "errloc" alone produces no debug info at all. Try 'readelf -dw' on the .o file. But why is this a valid scenario? Even if you have no debug info, GDB should be able to find the function's address in the elf dynamic symbol table? I'm not sure what you mean by "glibc patch to go through". If you mean acceptance upstream, then I'd consider this a separate issue, because now you're talking about being able to print the "errno" variable in the first place, even as a plain int with no printer? That is related, but orthogonal from the error_t type printer? Otherwise, can you clarify? -- it does work correctly if a function > *definition* is visible. But we need it to work given only the above, > possibly with no debug symbols in the shared library defining > "errloc". Won't __errno_location be always present in the dynamic symbol table? If you strip everything from libc.so.6 / libpthread.so.0 including dynamic symbols, leaving gdb with no clue about "__errno_location", let alone its address, there's no way that gdb can call it. (Unless you call it by address manually.) The next problem is that without debug info for __errno_location, gdb has no clue of its prototype, only that its a function, and so it assumes that it has type "int()", i.e., that it returns int, while in reality it returns int * / __error_t *. (Falling back to assuming "int" is IMO not the best idea, but I don't know the history behind it.) I.e., with your example .o + a separate .o file defining errloc (with no debug info), we now get: (gdb) ptype errloc type = int () and then calling "p *errloc()" (the equivalent of '#define errno (*__errno_location ())' silently subtly does the wrong thing. One dirty way around it would be for the printer to re-define the errno macro (using to cast __errno_location to the correct type before calling it, I guess: (gdb) macro define errno *(*(__error_t *(*) (void)) __errno_location) () That's make "errno" available when you compile with levels lower than -g3, too. Fedora gdb has a morally equivalent hack in the source code directly. Ideally, we'd find a clean way to make this all Just Work. Thanks, Pedro Alves
On 06/30/2017 05:37 PM, Pedro Alves wrote: > I've now sent the patch to gdb-patches: > > https://sourceware.org/ml/gdb-patches/2017-06/msg00827.html > > Could one of you check whether the type printer kicks in with that, > in the problematic cast case? I put it on the users/palves/whatis branch as well, btw. Thanks, Pedro Alves
On Fri, Jun 30, 2017 at 12:37 PM, Pedro Alves <palves@redhat.com> wrote: > On 06/30/2017 01:28 AM, Zack Weinberg wrote: >> For the glibc patch to go through, this case also needs to work: >> >> typedef int error_t; >> error_t var; >> extern error_t *errloc(void); >> >> int main(void) >> { >> return *errloc(); >> } >> >> (compiled to .o file) >> >> (gdb) ptype errloc >> No symbol "errloc" in current context. >> >> This might be a problem with inadequate debugging information >> generated by the compiler > > Debug info is generated for function definitions, not declarations. > I.e., that declaration for "errloc" alone produces no debug info > at all. Try 'readelf -dw' on the .o file. > > But why is this a valid scenario? Even if you have no debug info, > GDB should be able to find the function's address in the elf dynamic > symbol table? > > I'm not sure what you mean by "glibc patch to go through". > If you mean acceptance upstream, then I'd consider this a separate > issue, because now you're talking about being able to print the > "errno" variable in the first place, even as a plain int with no > printer? That is related, but orthogonal from the error_t type printer? > Otherwise, can you clarify? The _primary_ goal of the glibc patches is to trigger a pretty-printer when someone does (gdb) p errno with no further adornment. Since pretty-printers are keyed by type (and since people do store errno codes in other places than errno), this involves a type 'error_t' (and its implementation-namespace alias '__error_t'), but if we can't get 'p errno' to work, we're probably not going to bother. In all currently-supported configurations, this is glibc's definition of errno: extern int *__errno_location (void) __THROW __attribute__ ((__const__)); #define errno (*__errno_location ()) (__THROW is a macro expanding to 'throw ()' when compiling C++.) The patches change that to extern __error_t *__errno_location (void) __THROW __attribute__ ((__const__)); which should be sufficient to get the pretty-printing happening. But obviously that's only going to work if GDB actually knows that the return type of __errno_location is __error_t*, and it doesn't seem to, *even when debug information for the definition is available*: $ gdb /lib/x86_64-linux-gnu/libc.so.6 GNU gdb (Debian 7.12-6) 7.12.0.20161007-git ... Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug/.build-id/cc/80584889db7a969292959a46c718a2b1500702.debug...done. done. (gdb) ptype __errno_location type = int () (gdb) p __errno_location $1 = {<text variable, no debug info>} 0x20590 <__GI___errno_location> ... a suspicion occurs... (gdb) ptype __GI___errno_location type = int *(void) (gdb) p __GI___errno_location $2 = {int *(void)} 0x20590 <__GI___errno_location> ... so I guess this is a problem with the __GI_ indirection, which *may* be a thing we can resolve on our end. I don't fully understand that stuff. Still, It Would Be Nice™ if you _didn't_ have to have the debug symbols for libc.so installed for this to work. Probably a lot of people think you only need those if you are debugging libc itself. > The next problem is that without debug info for __errno_location, > gdb has no clue of its prototype, only that its a function, and so > it assumes that it has type "int()", i.e., that it returns int, > while in reality it returns int * / __error_t *. (Falling back > to assuming "int" is IMO not the best idea, but I don't know > the history behind it.) Probably because that's the pre-C89 legacy default function signature? > One dirty way around it would be for the printer to > re-define the errno macro (using to cast __errno_location to > the correct type before calling it, I guess: > > (gdb) macro define errno *(*(__error_t *(*) (void)) __errno_location) () > > That's make "errno" available when you compile with levels > lower than -g3, too. Hmm. How would one do that from inside Python? zw
On 06/30/2017 06:27 PM, Zack Weinberg wrote: > Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols > from /usr/lib/debug/.build-id/cc/80584889db7a969292959a46c718a2b1500702.debug...done. > done. > (gdb) ptype __errno_location > type = int () > (gdb) p __errno_location > $1 = {<text variable, no debug info>} 0x20590 <__GI___errno_location> > > ... a suspicion occurs... > > (gdb) ptype __GI___errno_location > type = int *(void) > (gdb) p __GI___errno_location > $2 = {int *(void)} 0x20590 <__GI___errno_location> > > ... so I guess this is a problem with the __GI_ indirection, which > *may* be a thing we can resolve on our end. I don't fully understand > that stuff. OK, thanks, that's clear. I don't know much about __GI_ indirection either. Redefining errno to call the __GI_ version would probably be too easy. :-) > Still, It Would Be Nice™ if you _didn't_ have to have the debug > symbols for libc.so installed for this to work. Probably a lot of > people think you only need those if you are debugging libc itself. I guess that that could also be seen as a packaging issue. May be it'd be possible to (always) have some kind of minimal debug info available for libc, with only the definitions of public API functions, describing their prototypes, but not their internals? See: https://fedoraproject.org/wiki/Features/MiniDebugInfo > >> The next problem is that without debug info for __errno_location, >> gdb has no clue of its prototype, only that its a function, and so >> it assumes that it has type "int()", i.e., that it returns int, >> while in reality it returns int * / __error_t *. (Falling back >> to assuming "int" is IMO not the best idea, but I don't know >> the history behind it.) > > Probably because that's the pre-C89 legacy default function signature? Yes, most likely. >> One dirty way around it would be for the printer to >> re-define the errno macro (using to cast __errno_location to >> the correct type before calling it, I guess: >> >> (gdb) macro define errno *(*(__error_t *(*) (void)) __errno_location) () >> >> That's make "errno" available when you compile with levels >> lower than -g3, too. > > Hmm. How would one do that from inside Python? There's no direct Python API, I believe. You'd just call the CLI command directly, with gdb.execute. xmethods sounds like something that maybe might be useful here: https://sourceware.org/gdb/onlinedocs/gdb/Xmethods-In-Python.html though from the docs it sounds like you can only replace class methods, not free functions, currently. Not sure, have never written any. Thanks, Pedro Alves
On 06/30/2017 07:11 PM, Pedro Alves wrote: > On 06/30/2017 06:27 PM, Zack Weinberg wrote: > >>> One dirty way around it would be for the printer to >>> re-define the errno macro (using to cast __errno_location to >>> the correct type before calling it, I guess: >>> >>> (gdb) macro define errno *(*(__error_t *(*) (void)) __errno_location) () >>> >>> That's make "errno" available when you compile with levels >>> lower than -g3, too. >> >> Hmm. How would one do that from inside Python? > > There's no direct Python API, I believe. You'd just call the CLI > command directly, with gdb.execute. > > xmethods sounds like something that maybe might be useful here: > https://sourceware.org/gdb/onlinedocs/gdb/Xmethods-In-Python.html > though from the docs it sounds like you can only replace class > methods, not free functions, currently. Not sure, have never written any. BTW, it'd be very nice if the printer could replace the "#define errno *__errno_location()" with an alternative implementation that would avoid the function call, so that "print errno": #1 - would also work when debugging core dumps #2 - is just plain safer. Having gdb call functions in the inferior always carries the risk of corrupting an already corrupt inferior even more. Thanks, Pedro Alves
On Friday 30 June 2017 10:57 PM, Zack Weinberg wrote: > The _primary_ goal of the glibc patches is to trigger a pretty-printer > when someone does > > (gdb) p errno > > with no further adornment. Since pretty-printers are keyed by type > (and since people do store errno codes in other places than errno), > this involves a type 'error_t' (and its implementation-namespace alias > '__error_t'), but if we can't get 'p errno' to work, we're probably > not going to bother. > > In all currently-supported configurations, this is glibc's definition of errno: > > extern int *__errno_location (void) __THROW __attribute__ ((__const__)); > #define errno (*__errno_location ()) > > (__THROW is a macro expanding to 'throw ()' when compiling C++.) > > The patches change that to > > extern __error_t *__errno_location (void) __THROW __attribute__ ((__const__)); > > which should be sufficient to get the pretty-printing happening. But > obviously that's only going to work if GDB actually knows that the > return type of __errno_location is __error_t*, and it doesn't seem to, > *even when debug information for the definition is available*: > > $ gdb /lib/x86_64-linux-gnu/libc.so.6 > GNU gdb (Debian 7.12-6) 7.12.0.20161007-git > ... > > Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols > from /usr/lib/debug/.build-id/cc/80584889db7a969292959a46c718a2b1500702.debug...done. > done. > (gdb) ptype __errno_location > type = int () > (gdb) p __errno_location > $1 = {<text variable, no debug info>} 0x20590 <__GI___errno_location> > > ... a suspicion occurs... > > (gdb) ptype __GI___errno_location > type = int *(void) > (gdb) p __GI___errno_location > $2 = {int *(void)} 0x20590 <__GI___errno_location> > > ... so I guess this is a problem with the __GI_ indirection, which > *may* be a thing we can resolve on our end. I don't fully understand > that stuff. > > Still, It Would Be Nice™ if you _didn't_ have to have the debug > symbols for libc.so installed for this to work. Probably a lot of > people think you only need those if you are debugging libc itself. The __GI_* alias is an internal alias of __errno_location and I've seen this before with other symbols, where a function address resolves to the internal alias instead of the public one in gdb as well as other places like objdump. It might make sense to turn this around, but I suspect there may be a reason for it that I am unaware of. Siddhesh
On 07/01/2017 03:35 PM, Siddhesh Poyarekar wrote: > On Friday 30 June 2017 10:57 PM, Zack Weinberg wrote: >> Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols >> from /usr/lib/debug/.build-id/cc/80584889db7a969292959a46c718a2b1500702.debug...done. >> done. >> (gdb) ptype __errno_location >> type = int () >> (gdb) p __errno_location >> $1 = {<text variable, no debug info>} 0x20590 <__GI___errno_location> >> >> ... a suspicion occurs... >> >> (gdb) ptype __GI___errno_location >> type = int *(void) >> (gdb) p __GI___errno_location >> $2 = {int *(void)} 0x20590 <__GI___errno_location> >> >> ... so I guess this is a problem with the __GI_ indirection, which >> *may* be a thing we can resolve on our end. I don't fully understand >> that stuff. > > The __GI_* alias is an internal alias of __errno_location and I've seen > this before with other symbols, where a function address resolves to the > internal alias instead of the public one in gdb as well as other places > like objdump. It might make sense to turn this around, but I suspect > there may be a reason for it that I am unaware of. > I look at this a bit today, and sent a gdb patch to handle this better now: https://sourceware.org/ml/gdb-patches/2017-07/msg00018.html I pushed it to the same branch as the other one: users/palves/whatis. Thanks, Pedro Alves
On 06/30/2017 07:11 PM, Pedro Alves wrote: > On 06/30/2017 06:27 PM, Zack Weinberg wrote: >> Pedro Alves wrote: >>> The next problem is that without debug info for __errno_location, >>> gdb has no clue of its prototype, only that its a function, and so >>> it assumes that it has type "int()", i.e., that it returns int, >>> while in reality it returns int * / __error_t *. (Falling back >>> to assuming "int" is IMO not the best idea, but I don't know >>> the history behind it.) >> >> Probably because that's the pre-C89 legacy default function signature? > > Yes, most likely. FYI, shortly after this discussion, yet another user showed up on #gdb confused by "p getenv("PATH") returning weird negative integer numbers [because he had no debug info for getenv...], so I decided to do something about it. I've now sent a patch series that stops GDB from assuming no-debug-info symbols have type int: [PATCH 00/13] No-debug-info debugging improvements https://sourceware.org/ml/gdb-patches/2017-07/msg00112.html Comments welcome. Thanks, Pedro Alves
On 07/13/2017 03:30 AM, Pedro Alves wrote: > On 06/30/2017 07:11 PM, Pedro Alves wrote: >> On 06/30/2017 06:27 PM, Zack Weinberg wrote: >>> Pedro Alves wrote: >>>> The next problem is that without debug info for __errno_location, >>>> gdb has no clue of its prototype, only that its a function, and so >>>> it assumes that it has type "int()", i.e., that it returns int, >>>> while in reality it returns int * / __error_t *. (Falling back >>>> to assuming "int" is IMO not the best idea, but I don't know >>>> the history behind it.) >>> >>> Probably because that's the pre-C89 legacy default function signature? >> >> Yes, most likely. > > FYI, shortly after this discussion, yet another user showed up > on #gdb confused by "p getenv("PATH") returning weird negative > integer numbers [because he had no debug info for getenv...], so I > decided to do something about it. I've now sent a patch series > that stops GDB from assuming no-debug-info symbols have > type int: > > [PATCH 00/13] No-debug-info debugging improvements > https://sourceware.org/ml/gdb-patches/2017-07/msg00112.html > > Comments welcome. FYI, this is now all in gdb master. I believe all the gdb issues uncovered by the errno printer have been addressed. Let me know if you're aware of something still not working properly. Thanks, Pedro Alves
On Mon, Sep 4, 2017 at 5:25 PM, Pedro Alves <palves@redhat.com> wrote: > > FYI, this is now all in gdb master. I believe all the gdb issues > uncovered by the errno printer have been addressed. Let me know > if you're aware of something still not working properly. I'm sorry I never got around to experimenting with your patches before now. With gdb master as of earlier today, and my patched glibc that tries to pretty-print errno, I can confirm that nearly everything works as desired. `p (error_t) 0` invokes the pretty-printer, and when preprocessor macro bodies are available to the debugger (-ggdb3) so does `p errno`. Unfortunately I am still getting this error message when I try to print the underlying thread-local errno variable (which is what `p errno` does if macro bodies are not available): Cannot find thread-local storage for process 4367, executable file /home/zack/projects/glibc/glibc-build/stdlib/test-errno-printer: Cannot find thread-local variables on this target I don't understand why thread-local variables are inaccessible on my perfectly ordinary x86_64-unknown-linux-gnu workstation (the base OS is Debian 'stretch'). Do you have any idea what might be wrong? zw
On 09/05/2017 10:15 PM, Zack Weinberg wrote: > On Mon, Sep 4, 2017 at 5:25 PM, Pedro Alves <palves@redhat.com> wrote: >> >> FYI, this is now all in gdb master. I believe all the gdb issues >> uncovered by the errno printer have been addressed. Let me know >> if you're aware of something still not working properly. > > I'm sorry I never got around to experimenting with your patches before now. Really no worries. > With gdb master as of earlier today, and my patched glibc that tries > to pretty-print errno, I can confirm that nearly everything works as > desired. `p (error_t) 0` invokes the pretty-printer, and when > preprocessor macro bodies are available to the debugger (-ggdb3) so > does `p errno`. Unfortunately I am still getting this error message > when I try to print the underlying thread-local errno variable (which > is what `p errno` does if macro bodies are not available): > > Cannot find thread-local storage for process 4367, executable file > /home/zack/projects/glibc/glibc-build/stdlib/test-errno-printer: > Cannot find thread-local variables on this target > > I don't understand why thread-local variables are inaccessible on my > perfectly ordinary x86_64-unknown-linux-gnu workstation (the base OS > is Debian 'stretch'). Do you have any idea what might be wrong? I assume your test program isn't actually linked with -pthread? When you do "print errno" in this situation, because there's no "#define errno ..." in sight, gdb ends up finding the real "errno" symbol, which, even though the program isn't threaded, is a TLS symbol, and as such has a DWARF location expression describing its location as an offset into the thread-local block for the current thread. GDB needs to resolve that address, and for threaded programs that is normally done with assistance from libthread_db.so. The problem is then that libthread_db.so only works with programs that link with libpthread.so, and if your test program is actually non-threaded, it doesn't link with libpthread.so, So without libthread_db.so's assistance, gdb cannot "find [the address of] thread-local variables on this target". The error message is coming from generic GDB "get address of tls var" code several layers above GNU/Linux-specific libthread_db.so-interaction code, but still it could probably be made clearer, maybe by adding "the address of". A workaround specifically for errno, and only for live-process debugging [*] is the "macro define" trick I had suggested before: (gdb) macro define errno (*__errno_location ()) After that, "p errno" ends up calling __errno_location just like when you compile the test program with -g3. [*] doesn't work with core file / postmortem debugging. I don't know whether GDB could be able to resolve the location of the errno variable for the main thread (for single-threaded programs): - without libthread_db.so assistance and - without baking in awareness of glibc internal data structures If there is I'd absolutely love to learn about it. Thanks, Pedro Alves
On Tue, Sep 5, 2017 at 6:31 PM, Pedro Alves <palves@redhat.com> wrote: > On 09/05/2017 10:15 PM, Zack Weinberg wrote: >> >> I don't understand why thread-local variables are inaccessible on my >> perfectly ordinary x86_64-unknown-linux-gnu workstation (the base OS >> is Debian 'stretch'). Do you have any idea what might be wrong? > > I assume your test program isn't actually linked with -pthread? That is correct. > When you do "print errno" in this situation, because there's no > "#define errno ..." in sight, gdb ends up finding the real "errno" symbol, > which, even though the program isn't threaded, is a TLS symbol, and as such has > a DWARF location expression describing its location as an offset into the > thread-local block for the current thread. GDB needs to resolve that address, and > for threaded programs that is normally done with assistance from libthread_db.so. > The problem is then that libthread_db.so only works with programs that > link with libpthread.so, and if your test program is actually non-threaded, > it doesn't link with libpthread.so. I am not familiar with the glibc-side TLS implementation, nor with libthread_db.so, nor the code in GDB that uses libthread_db.so. However, reading the implementation of td_thr_tls_get_addr leads me to believe that that function is *supposed* to work even if libpthread.so has not been loaded into the 'inferior'. If it doesn't, perhaps that is a bug on our side. Do you know if GDB even tries? It's not obvious to me looking at linux-thread-db.c. > A workaround specifically for errno, and only for live-process debugging [*] > is the "macro define" trick I had suggested before: > > (gdb) macro define errno (*__errno_location ()) > > After that, "p errno" ends up calling __errno_location just > like when you compile the test program with -g3. Again, is it possible to do (the equivalent of) this from the initialization code of a pretty-printer module? Specifically, there already exists a Python function that's doing this: def register(objfile): """Register pretty printers for the current objfile.""" printer = gdb.printing.RegexpCollectionPrettyPrinter("glibc-errno") printer.add_printer('error_t', r'^(?:__)?error_t', ErrnoPrinter) if objfile == None: objfile = gdb gdb.printing.register_pretty_printer(objfile, printer) called when the module is loaded; what would I need to add to that so that the macro is defined (if it isn't already)? zw
On 09/06/2017 02:05 PM, Zack Weinberg wrote: > On Tue, Sep 5, 2017 at 6:31 PM, Pedro Alves <palves@redhat.com> wrote: >> On 09/05/2017 10:15 PM, Zack Weinberg wrote: >>> >>> I don't understand why thread-local variables are inaccessible on my >>> perfectly ordinary x86_64-unknown-linux-gnu workstation (the base OS >>> is Debian 'stretch'). Do you have any idea what might be wrong? >> >> I assume your test program isn't actually linked with -pthread? > > That is correct. > >> When you do "print errno" in this situation, because there's no >> "#define errno ..." in sight, gdb ends up finding the real "errno" symbol, >> which, even though the program isn't threaded, is a TLS symbol, and as such has >> a DWARF location expression describing its location as an offset into the >> thread-local block for the current thread. GDB needs to resolve that address, and >> for threaded programs that is normally done with assistance from libthread_db.so. >> The problem is then that libthread_db.so only works with programs that >> link with libpthread.so, and if your test program is actually non-threaded, >> it doesn't link with libpthread.so. > > I am not familiar with the glibc-side TLS implementation, nor with > libthread_db.so, nor the code in GDB that uses libthread_db.so. > However, reading the implementation of td_thr_tls_get_addr leads me to > believe that that function is *supposed* to work even if libpthread.so > has not been loaded into the 'inferior'. If it doesn't, perhaps that > is a bug on our side. Do you know if GDB even tries? It's not obvious > to me looking at linux-thread-db.c. GDB only tries to load libthread_db.so if it detects libpthread.so loaded in the inferior. gdb/linux-thread-db.c:thread_db_new_objfile is called for every shared library found in the inferior. However, if we hack gdb like this to force it to always try to load libthread_db.so: diff --git a/gdb/linux-thread-db.c b/gdb/linux-thread-db.c index 6d98135..6cf634e 100644 --- a/gdb/linux-thread-db.c +++ b/gdb/linux-thread-db.c @@ -1000,8 +1000,11 @@ thread_db_new_objfile (struct objfile *objfile) For dynamically linked executables, libpthread can be near the end of the list of shared libraries to load, and in an app of several thousand shared libraries, this can otherwise be painful. */ +#if 0 && ((objfile->flags & OBJF_MAINLINE) != 0 - || libpthread_name_p (objfile_name (objfile)))) + || libpthread_name_p (objfile_name (objfile))) +#endif + ) check_for_thread_db (); } we get: (gdb) set debug libthread-db 1 (gdb) nosharedlibrary (gdb) sharedlibrary Reading symbols from /lib64/libc.so.6...Reading symbols from /usr/lib/debug/usr/lib64/libc-2.22.so.debug...done. done. Trying host libthread_db library: libthread_db.so.1. Host libthread_db.so.1 resolved to: /lib64/libthread_db.so.1. td_ta_new failed: application not linked with libthread thread_db_load_search returning 0 ... That "td_ta_new failed: application not linked with libthread" error is output by thread_db_err_str in linux-thread-db.c. It's just pretty-printing TD_NOLIBTHREAD. I.e., opening a connection to libthread_db.so fails: /* Now attempt to open a connection to the thread library. */ err = info->td_ta_new_p (&info->proc_handle, &info->thread_agent); if (err != TD_OK) { Because lithread_db.so itself "rejects" the inferior. If we repeat the exercise with a breakpoint on gdb's ps_pglobal_lookup (from proc-service.c, which implements the libthread_db.so -> gdb callback interface), we see: [note to self: IWBN to make "set debug libthread-db 1" print these look ups] ... (gdb) sharedlibrary Reading symbols from /lib64/libc.so.6...Reading symbols from /usr/lib/debug/usr/lib64/libc-2.22.so.debug...done. done. Trying host libthread_db library: libthread_db.so.1. Host libthread_db.so.1 resolved to: /lib64/libthread_db.so.1. Breakpoint 3, ps_pglobal_lookup (ph=0x1f04100, obj=0x7fffe58c50f8 "libpthread.so.0", name=0x7fffe58c5221 "nptl_version", sym_addr=0x7fffffffcf78) at src/gdb/proc-service.c:113 113 struct inferior *inf = find_inferior_ptid (ph->ptid); (top-gdb) c Continuing. td_ta_new failed: application not linked with libthread thread_db_load_search returning 0 ... So in order to check whether the inferior is running a matching glibc version, libthread_db.so looks for a "nptl_version" symbol, which is only defined in libpthread.so, so not found in the inferior. I did not try to hack gdb to return some address that happens to point at a value that happens to match the expected version. I think that for that experiment, hacking libthread_db.so itself to skip the checks would likely be easier [and I can't afford to do it right now]. > >> A workaround specifically for errno, and only for live-process debugging [*] >> is the "macro define" trick I had suggested before: >> >> (gdb) macro define errno (*__errno_location ()) >> >> After that, "p errno" ends up calling __errno_location just >> like when you compile the test program with -g3. > > Again, is it possible to do (the equivalent of) this from the > initialization code of a pretty-printer module? Specifically, there > already exists a Python function that's doing this: > > def register(objfile): > """Register pretty printers for the current objfile.""" > > printer = gdb.printing.RegexpCollectionPrettyPrinter("glibc-errno") > printer.add_printer('error_t', r'^(?:__)?error_t', ErrnoPrinter) > > if objfile == None: > objfile = gdb > > gdb.printing.register_pretty_printer(objfile, printer) > > called when the module is loaded; what would I need to add to that so > that the macro is defined (if it isn't already)? I'm hoping that other people more experienced with the gdb Python API can chime in. My idea was just to call gdb.execute ("macro define errno (*(int *) __errno_location ())") somewhere around your Python code. Thanks, Pedro Alves
On Wed, Sep 6, 2017 at 9:31 AM, Pedro Alves <palves@redhat.com> wrote: > On 09/06/2017 02:05 PM, Zack Weinberg wrote: >> I am not familiar with the glibc-side TLS implementation, nor with >> libthread_db.so, nor the code in GDB that uses libthread_db.so. >> However, reading the implementation of td_thr_tls_get_addr leads me to >> believe that that function is *supposed* to work even if libpthread.so >> has not been loaded into the 'inferior'. If it doesn't, perhaps that >> is a bug on our side. Do you know if GDB even tries? It's not obvious >> to me looking at linux-thread-db.c. > > GDB only tries to load libthread_db.so if it detects libpthread.so loaded > in the inferior. gdb/linux-thread-db.c:thread_db_new_objfile is called for > every shared library found in the inferior. > > However, if we hack gdb like this to force it to always try to > load libthread_db.so: ... > That "td_ta_new failed: application not linked with libthread" > error is output by thread_db_err_str in linux-thread-db.c. It's > just pretty-printing TD_NOLIBTHREAD. I.e., opening a connection > to libthread_db.so fails: > > /* Now attempt to open a connection to the thread library. */ > err = info->td_ta_new_p (&info->proc_handle, &info->thread_agent); > if (err != TD_OK) > { > > Because lithread_db.so itself "rejects" the inferior. So, changes to both gdb and libthread_db seem to be required here. I do think that _in principle_ it ought to be possible to use libthread_db to retrieve the address of thread-local data even if the inferior is not linked with libpthread; glibc has quite a few thread-specific variables (errno most prominent, of course, but also h_errno, _res, etc), and so might any library which can be used from both single- and multithreaded programs. This is really not code I feel comfortable hacking up, though, and it's probably more of a project than I have time for, in any case. ... >> called when the module is loaded; what would I need to add to that so >> that the macro is defined (if it isn't already)? > > I'm hoping that other people more experienced with the gdb > Python API can chime in. My idea was just to call > gdb.execute ("macro define errno (*(int *) __errno_location ())") > somewhere around your Python code. I'll tinker with that. Thanks. zw
On 09/06/2017 10:03 PM, Zack Weinberg wrote: > So, changes to both gdb and libthread_db seem to be required here. I > do think that _in principle_ it ought to be possible to use > libthread_db to retrieve the address of thread-local data even if the > inferior is not linked with libpthread; glibc has quite a few > thread-specific variables (errno most prominent, of course, but also > h_errno, _res, etc), and so might any library which can be used from > both single- and multithreaded programs. > > This is really not code I feel comfortable hacking up, though, and > it's probably more of a project than I have time for, in any case. Sounds like a promising approach though. I'd like to see this path explored a bit more. I'll keep this in my TODO, even though it's not likely to bubble up very soon. Thanks for the discussion/ideas! > > ... >>> called when the module is loaded; what would I need to add to that so >>> that the macro is defined (if it isn't already)? >> >> I'm hoping that other people more experienced with the gdb >> Python API can chime in. My idea was just to call >> gdb.execute ("macro define errno (*(int *) __errno_location ())") >> somewhere around your Python code. > > I'll tinker with that. Thanks. Thanks, Pedro Alves
diff --git a/gdb/eval.c b/gdb/eval.c index 2a39774..6aa2499 100644 --- a/gdb/eval.c +++ b/gdb/eval.c @@ -2735,8 +2735,10 @@ evaluate_subexp_standard (struct type *expect_type, result because we do not want to dig past all typedefs. */ check_typedef (type); +#if 0 if (TYPE_CODE (type) == TYPE_CODE_TYPEDEF) type = TYPE_TARGET_TYPE (type); +#endif return allocate_value (type); } else diff --git a/gdb/valops.c b/gdb/valops.c index 8675e6c..058fe1e 100644 --- a/gdb/valops.c +++ b/gdb/valops.c @@ -378,6 +378,8 @@ value_cast (struct type *type, struct value *arg2) /* We deref the value and then do the cast. */ return value_cast (type, coerce_ref (arg2)); + struct type *org_type = type; + type = check_typedef (type); code1 = TYPE_CODE (type); arg2 = coerce_ref (arg2); @@ -452,14 +454,14 @@ value_cast (struct type *type, struct value *arg2) && (code2 == TYPE_CODE_STRUCT || code2 == TYPE_CODE_UNION) && TYPE_NAME (type) != 0) { - struct value *v = value_cast_structs (type, arg2); + struct value *v = value_cast_structs (org_type, arg2); if (v) return v; } if (code1 == TYPE_CODE_FLT && scalar) - return value_from_double (type, value_as_double (arg2)); + return value_from_double (org_type, value_as_double (arg2)); else if (code1 == TYPE_CODE_DECFLOAT && scalar) { enum bfd_endian byte_order = gdbarch_byte_order (get_type_arch (type)); @@ -475,7 +477,7 @@ value_cast (struct type *type, struct value *arg2) /* The only option left is an integral type. */ decimal_from_integral (arg2, dec, dec_len, byte_order); - return value_from_decfloat (type, dec); + return value_from_decfloat (org_type, dec); } else if ((code1 == TYPE_CODE_INT || code1 == TYPE_CODE_ENUM || code1 == TYPE_CODE_RANGE) @@ -496,7 +498,7 @@ value_cast (struct type *type, struct value *arg2) gdbarch_byte_order (get_type_arch (type2))); else longest = value_as_long (arg2); - return value_from_longest (type, convert_to_boolean ? + return value_from_longest (org_type, convert_to_boolean ? (LONGEST) (longest ? 1 : 0) : longest); } else if (code1 == TYPE_CODE_PTR && (code2 == TYPE_CODE_INT @@ -522,14 +524,14 @@ value_cast (struct type *type, struct value *arg2) || longest <= -((LONGEST) 1 << addr_bit)) warning (_("value truncated")); } - return value_from_longest (type, longest); + return value_from_longest (org_type, longest); } else if (code1 == TYPE_CODE_METHODPTR && code2 == TYPE_CODE_INT && value_as_long (arg2) == 0) { - struct value *result = allocate_value (type); + struct value *result = allocate_value (org_type); - cplus_make_method_ptr (type, value_contents_writeable (result), 0, 0); + cplus_make_method_ptr (org_type, value_contents_writeable (result), 0, 0); return result; } else if (code1 == TYPE_CODE_MEMBERPTR && code2 == TYPE_CODE_INT @@ -537,7 +539,7 @@ value_cast (struct type *type, struct value *arg2) { /* The Itanium C++ ABI represents NULL pointers to members as minus one, instead of biasing the normal case. */ - return value_from_longest (type, -1); + return value_from_longest (org_type, -1); } else if (code1 == TYPE_CODE_ARRAY && TYPE_VECTOR (type) && code2 == TYPE_CODE_ARRAY && TYPE_VECTOR (type2) @@ -548,21 +550,21 @@ value_cast (struct type *type, struct value *arg2) error (_("can only cast scalar to vector of same size")); else if (code1 == TYPE_CODE_VOID) { - return value_zero (type, not_lval); + return value_zero (org_type, not_lval); } else if (TYPE_LENGTH (type) == TYPE_LENGTH (type2)) { if (code1 == TYPE_CODE_PTR && code2 == TYPE_CODE_PTR) - return value_cast_pointers (type, arg2, 0); + return value_cast_pointers (org_type, arg2, 0); arg2 = value_copy (arg2); - deprecated_set_value_type (arg2, type); - set_value_enclosing_type (arg2, type); + deprecated_set_value_type (arg2, org_type); + set_value_enclosing_type (arg2, org_type); set_value_pointed_to_offset (arg2, 0); /* pai: chk_val */ return arg2; } else if (VALUE_LVAL (arg2) == lval_memory) - return value_at_lazy (type, value_address (arg2)); + return value_at_lazy (org_type, value_address (arg2)); else { error (_("Invalid cast."));