Message ID | 1497627984.6717.32.camel@pbcl.net |
---|---|
State | New |
Headers | show |
On 06/16/2017 05:46 PM, Phil Blundell wrote:
> Opinions?
Is this related to bug 20874?
https://sourceware.org/bugzilla/show_bug.cgi?id=20874
Thanks,
Florian
Note that test-driver.c should now be used instead of test-skeleton.c.
On 06/16/2017 08:46 AM, Phil Blundell wrote: > getaddrinfo_a() doesn't appear to be part of any > standard, and it also isn't documented in the glibc manual sockets.texi is pretty out-of-date (to be nice). It's hard to try and write networking code using the glibc manual because so much of what you see in use isn't in there anywhere. I started working on getaddrinfo, sockaddr_storage, etc., a while back, and have a few patches laying around somewhere I need to dust off, but anything substantial with the chapter is likely in the 2.27-2.28 timeline for me, unless someone else gets to it first. It's definitely on the radar though. Rical
On Fri, 2017-06-16 at 17:52 +0200, Florian Weimer wrote: > On 06/16/2017 05:46 PM, Phil Blundell wrote: > > Opinions? > > Is this related to bug 20874? > > https://sourceware.org/bugzilla/show_bug.cgi?id=20874 > No, I don't think it's related. I poked at that bug a bit and I am fairly sure that the problem there is specifically in gai_suspend(). Under conditions that I don't entirely understand yet, we seem to be somehow returning from gai_suspend while its waitlist[] entry is still linked into requestlist->waiting. Since gai_suspend() allocated its waitlist[] on the stack, this leads rapidly to disaster when gai_notify subsequently tries to traverse the linked list. p.
On 06/19/2017 01:04 PM, Phil Blundell wrote: > On Fri, 2017-06-16 at 17:52 +0200, Florian Weimer wrote: >> On 06/16/2017 05:46 PM, Phil Blundell wrote: >>> Opinions? >> >> Is this related to bug 20874? >> >> https://sourceware.org/bugzilla/show_bug.cgi?id=20874 >> > > No, I don't think it's related. I poked at that bug a bit and I am > fairly sure that the problem there is specifically in gai_suspend(). > Under conditions that I don't entirely understand yet, we seem to be > somehow returning from gai_suspend while its waitlist[] entry is still > linked into requestlist->waiting. Since gai_suspend() allocated its > waitlist[] on the stack, this leads rapidly to disaster when gai_notify > subsequently tries to traverse the linked list. Okay, thanks for investigating. How does the memory leak happen? Would another notification eventually deallocate the struct async_waitlist object? Thanks, Florian
On Mon, 2017-06-19 at 13:51 +0200, Florian Weimer wrote: > How does the memory leak happen? Would another notification > eventually > deallocate the struct async_waitlist object? The async_waitlist is allocated in getaddrinfo_a and, under normal circumstances, freed by this dubious-looking code in __gai_notify(): /* This is tricky. See getaddrinfo_a.c for the reason why this works. */ free ((void *) waitlist->counterp); If gai_cancel() removes the entry from the request queue then, as the code stands today, nothing will cause __gai_notify() to be called for it and hence the async_waitlist is never freed. I don't think there is any mechanism that will cause another notification to eventually deallocate the memory. But, I ran a testcase (basically identical to the tst-leaks3 that I posted before) under valgrind and it considers those blocks to be "possibly lost" rather than the "definitely lost" that I was expecting, so perhaps there is some internal object that tracks them after all. I do find all the twisty little data structures involved in getaddrinfo_a() particularly hard to keep straight in my head so it's entirely possible I have overlooked something. p.
On Mon, 2017-06-19 at 13:27 +0100, Phil Blundell wrote: > If gai_cancel() removes the entry from the request queue then, as the > code stands today, nothing will cause __gai_notify() to be called for > it and hence the async_waitlist is never freed. From inspection of the code I think there is another leak too: __gai_remove_request() unlinks "runp" from the request list, but as far as I can tell there is nothing that will ever put it back in the freelist so that element is essentially lost. This leak is not immediately obvious from the mtrace or valgrind output, because all the "struct requestlist" objects live in a single block that was calloc()ed in get_elem() and the block itself is not lost, but if you run a test that simply calls getaddrinfo_a() and gai_cancel() in a loop then you can see that the process size keeps growing without any obvious bound. (On my machine it reached 4G of rss within a couple of seconds at which point I killed it.) p.
On Sat, 17 Jun 2017, Rical Jasan wrote: > On 06/16/2017 08:46 AM, Phil Blundell wrote: > > getaddrinfo_a() doesn't appear to be part of any > > standard, and it also isn't documented in the glibc manual > > sockets.texi is pretty out-of-date (to be nice). It's hard to try and > write networking code using the glibc manual because so much of what you > see in use isn't in there anywhere. I started working on getaddrinfo, > sockaddr_storage, etc., a while back, and have a few patches laying > around somewhere I need to dust off, but anything substantial with the > chapter is likely in the 2.27-2.28 timeline for me, unless someone else > gets to it first. Note the undocumentedness of getaddrinfo / getnameinfo is bug 17202 (I'm sure there's a lot more undocumented than there are open bugs for, however).
From 11ec70a0bcd3bef4595020ada88303d349884f22 Mon Sep 17 00:00:00 2001 From: Phil Blundell <pb@pbcl.net> Date: Fri, 16 Jun 2017 16:36:52 +0100 Subject: [PATCH] resolv/tst-leaks3.c: New test --- resolv/Makefile | 15 +++++++++++---- resolv/tst-leaks3.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+), 4 deletions(-) create mode 100644 resolv/tst-leaks3.c diff --git a/resolv/Makefile b/resolv/Makefile index dc597ca097..22926a9e5b 100644 --- a/resolv/Makefile +++ b/resolv/Makefile @@ -31,9 +31,9 @@ routines := herror inet_addr inet_ntop inet_pton nsap_addr res_init \ res_hconf res_libc res-state tests = tst-aton tst-leaks tst-inet_ntop -xtests = tst-leaks2 +xtests = tst-leaks2 tst-leaks3 -generate := mtrace-tst-leaks.out tst-leaks.mtrace tst-leaks2.mtrace +generate := mtrace-tst-leaks.out tst-leaks.mtrace tst-leaks2.mtrace tst-leaks3.mtrace extra-libs := libresolv libnss_dns ifeq ($(have-thread-library),yes) @@ -95,7 +95,7 @@ endif ifeq ($(run-built-tests),yes) ifneq (no,$(PERL)) tests-special += $(objpfx)mtrace-tst-leaks.out -xtests-special += $(objpfx)mtrace-tst-leaks2.out +xtests-special += $(objpfx)mtrace-tst-leaks2.out $(objpfx)mtrace-tst-leaks3.out endif endif @@ -107,7 +107,8 @@ headers += rpc/netdb.h endif generated += mtrace-tst-leaks.out tst-leaks.mtrace \ - mtrace-tst-leaks2.out tst-leaks2.mtrace + mtrace-tst-leaks2.out tst-leaks2.mtrace \ + mtrace-tst-leaks3.out tst-leaks3.mtrace include ../Rules @@ -135,6 +136,12 @@ $(objpfx)mtrace-tst-leaks2.out: $(objpfx)tst-leaks2.out $(common-objpfx)malloc/mtrace $(objpfx)tst-leaks2.mtrace > $@; \ $(evaluate-test) +$(objpfx)tst-leaks3: $(objpfx)libresolv.so $(objpfx)libanl.so +tst-leaks3-ENV = MALLOC_TRACE=$(objpfx)tst-leaks3.mtrace +$(objpfx)mtrace-tst-leaks3.out: $(objpfx)tst-leaks3.out + $(common-objpfx)malloc/mtrace $(objpfx)tst-leaks3.mtrace > $@; \ + $(evaluate-test) + $(objpfx)tst-bug18665-tcp: $(objpfx)libresolv.so $(shared-thread-library) $(objpfx)tst-bug18665: $(objpfx)libresolv.so $(shared-thread-library) $(objpfx)tst-res_use_inet6: $(objpfx)libresolv.so $(shared-thread-library) diff --git a/resolv/tst-leaks3.c b/resolv/tst-leaks3.c new file mode 100644 index 0000000000..93f06e9e84 --- /dev/null +++ b/resolv/tst-leaks3.c @@ -0,0 +1,54 @@ +/* Tests for getaddrinfo_a + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <mcheck.h> +#include <netdb.h> +#include <resolv.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> + +static int +do_test (void) +{ + mtrace (); + + for (int i = 0; i < 1000; ++i) + { + struct gaicb *req[1]; + req[0] = malloc (sizeof (*req[0])); + memset (req[0], 0, sizeof (*req[0])); + req[0]->ar_name = "www.gnu.org"; + + int ret = getaddrinfo_a (GAI_NOWAIT, req, 1, NULL); + if (ret != 0) + { + fprintf (stderr, "getaddrinfo_a() failed: %s\n", + gai_strerror(ret)); + exit (EXIT_FAILURE); + } + + gai_cancel (req[0]); + } + return 0; +} + +#define TEST_FUNCTION do_test () +#define TIMEOUT 30 +/* This defines the `main' function and some more. */ +#include <test-skeleton.c> -- 2.11.0