Message ID | 1453767942-19369-52-git-send-email-joern@purestorage.com |
---|---|
State | New |
Headers | show |
On 26/01/16 00:25, Joern Engel wrote: > With signals we can reenter the thread-cache. Protect against that with > a lock. Will almost never happen in practice, it took the company five > years to reproduce a similar race in the existing malloc. But easy to > trigger with a targeted test. why do you try to make malloc as-safe? isn't it better to fix malloc usage in signal handlers?
On 01/26/2016 03:44 PM, Szabolcs Nagy wrote: > On 26/01/16 00:25, Joern Engel wrote: >> With signals we can reenter the thread-cache. Protect against that with >> a lock. Will almost never happen in practice, it took the company five >> years to reproduce a similar race in the existing malloc. But easy to >> trigger with a targeted test. > > why do you try to make malloc as-safe? > > isn't it better to fix malloc usage in signal handlers? FYI I once tried to check OSS for violations of signal handler requirements (see https://github.com/yugr/sigcheck). It turned out that many packages (ab)use malloc in sighandlers (directly or via printf et al.). Fixing (or even finding) all badly behaving software out there would be a huge amount of work so "fixing" it on the library side instead may make sense. -Y
On 01/26/2016 01:44 PM, Szabolcs Nagy wrote: > On 26/01/16 00:25, Joern Engel wrote: >> With signals we can reenter the thread-cache. Protect against that with >> a lock. Will almost never happen in practice, it took the company five >> years to reproduce a similar race in the existing malloc. But easy to >> trigger with a targeted test. > > why do you try to make malloc as-safe? > > isn't it better to fix malloc usage in signal handlers? We have functionality like dprintf, syslog, backtrace, C++ thread-local object access which might be used from signal handlers, but which can call malloc. Fixing this in malloc may seem attractive, but I doubt it can be made completely reliable by concentrating changes there. Florian
On 01/26/2016 04:14 PM, Florian Weimer wrote: > On 01/26/2016 01:44 PM, Szabolcs Nagy wrote: >> On 26/01/16 00:25, Joern Engel wrote: >>> With signals we can reenter the thread-cache. Protect against that with >>> a lock. Will almost never happen in practice, it took the company five >>> years to reproduce a similar race in the existing malloc. But easy to >>> trigger with a targeted test. >> >> why do you try to make malloc as-safe? >> >> isn't it better to fix malloc usage in signal handlers? > > We have functionality like dprintf, syslog, backtrace, C++ thread-local > object access which might be used from signal handlers, but which can > call malloc. Right. One can argue that this is an error and someone should go fix the code but in my experience most real-world signal handlers have these problems and noone is going to rewrite them any time soon. > Fixing this in malloc may seem attractive, but I doubt it > can be made completely reliable by concentrating changes there. > > Florian > > >
On 26/01/16 13:23, Yury Gribov wrote: > On 01/26/2016 04:14 PM, Florian Weimer wrote: >> On 01/26/2016 01:44 PM, Szabolcs Nagy wrote: >>> On 26/01/16 00:25, Joern Engel wrote: >>>> With signals we can reenter the thread-cache. Protect against that with >>>> a lock. Will almost never happen in practice, it took the company five >>>> years to reproduce a similar race in the existing malloc. But easy to >>>> trigger with a targeted test. >>> >>> why do you try to make malloc as-safe? >>> >>> isn't it better to fix malloc usage in signal handlers? >> >> We have functionality like dprintf, syslog, backtrace, C++ thread-local >> object access which might be used from signal handlers, but which can >> call malloc. > > Right. One can argue that this is an error and someone should go fix the code but in my experience most > real-world signal handlers have these problems and noone is going to rewrite them any time soon. > it is easy to write non-conforming code, but that does not mean the libc should try to make it work. in this case signals should be blocked whenever the library is entered through non-as-safe api calls. that has a significant cost for portable code. >> Fixing this in malloc may seem attractive, but I doubt it >> can be made completely reliable by concentrating changes there. >> >> Florian >> >> >> >
On Tue, Jan 26, 2016 at 12:44:57PM +0000, Szabolcs Nagy wrote: > On 26/01/16 00:25, Joern Engel wrote: > > With signals we can reenter the thread-cache. Protect against that with > > a lock. Will almost never happen in practice, it took the company five > > years to reproduce a similar race in the existing malloc. But easy to > > trigger with a targeted test. > > why do you try to make malloc as-safe? Why would I not? I find it easier to make malloc signal-safe than to find an alternative fix for the signal handler of a single application. If nothing else, the cost of entering the signal handler a million times is much lower for a malloc stresstest than an application. > isn't it better to fix malloc usage in signal handlers? If your goal is portability across many different malloc libraries, I would agree. If your goal is to fix deadlocks in applications, changing malloc is a more economical use of your time. Jörn -- Functionality is an asset, but code is a liability. --Ted Dziuba
On 26 Jan 2016 13:40, Szabolcs Nagy wrote: > On 26/01/16 13:23, Yury Gribov wrote: > > On 01/26/2016 04:14 PM, Florian Weimer wrote: > >> On 01/26/2016 01:44 PM, Szabolcs Nagy wrote: > >>> On 26/01/16 00:25, Joern Engel wrote: > >>>> With signals we can reenter the thread-cache. Protect against that with > >>>> a lock. Will almost never happen in practice, it took the company five > >>>> years to reproduce a similar race in the existing malloc. But easy to > >>>> trigger with a targeted test. > >>> > >>> why do you try to make malloc as-safe? > >>> > >>> isn't it better to fix malloc usage in signal handlers? > >> > >> We have functionality like dprintf, syslog, backtrace, C++ thread-local > >> object access which might be used from signal handlers, but which can > >> call malloc. > > > > Right. One can argue that this is an error and someone should go fix the code but in my experience most > > real-world signal handlers have these problems and noone is going to rewrite them any time soon. > > > > it is easy to write non-conforming code, but that > does not mean the libc should try to make it work. and that has a real cost for real time code, as well as adding real overhead to hot paths. we spend so much time on trying to get lock less code, just to re-add overhead by calling into the kernel ? > in this case signals should be blocked whenever the > library is entered through non-as-safe api calls. > that has a significant cost for portable code. err, this statement makes no sense. if you were writing portable code, you wouldn't be using malloc or other non-as safe code in the signal handler in the first place. -mike
On Wed, Jan 27, 2016 at 10:05:33AM +0000, Szabolcs Nagy wrote: > > portable code does not need the fix, but all users > of the api are affected by the overhead of the fix. Sorry, but you have no idea what you are talking about. The overhead of the fix is _negative_. Users _want_ to be affected. The reason why free() is not signalsafe is that it spins on an arena-lock. If the same thread already holds that lock, you have a classic deadlock. By not spinning on the lock you make code run faster. You also fix the signal-induced deadlock. Ok, for thread-cache there is some overhead. Without signal-safety you wouldn't need a lock for the thread-cache at all. But here I call bullshit again, because I had the same concerns. Then I measured and could not demonstrate any performance impact. "If you cannot measure it, it does not exist." You might disagree philosophically, but if an engineers goes out of his way to measure a certain effect and finds nothing, that effect hardly matters in any practical way. Jörn -- Only children believe that ideas are simply born. Usually two other ideas have sex. -- Jon Gnarr
On Wed, 2016-01-27 at 09:44 -0800, Jörn Engel wrote: > On Wed, Jan 27, 2016 at 10:05:33AM +0000, Szabolcs Nagy wrote: > > > > portable code does not need the fix, but all users > > of the api are affected by the overhead of the fix. > > Sorry, but you have no idea what you are talking about. The overhead of > the fix is _negative_. Users _want_ to be affected. > > The reason why free() is not signalsafe is that it spins on an > arena-lock. If the same thread already holds that lock, you have a > classic deadlock. By not spinning on the lock you make code run faster. > You also fix the signal-induced deadlock. > > Ok, for thread-cache there is some overhead. Without signal-safety you > wouldn't need a lock for the thread-cache at all. But here I call > bullshit again, because I had the same concerns. Then I measured and > could not demonstrate any performance impact. > > "If you cannot measure it, it does not exist." > > You might disagree philosophically, but if an engineers goes out of his > way to measure a certain effect and finds nothing, that effect hardly > matters in any practical way. Then it should be sufficient if you can describe what you actually measured to convince Szabolcs; if your measurement is as good as you seem to say it is, it should be obvious why his (and your prior) concerns are unfounded. Up to now, you just stated that you did measure something. I'd also like to kindly point out that we're not on LKML here. We all will understand you better if you provide sounds arguments and provide the necessary background information instead of if you "call bullshit".
On Wed, Jan 27, 2016 at 08:19:02PM +0100, Torvald Riegel wrote: > > Then it should be sufficient if you can describe what you actually > measured to convince Szabolcs; if your measurement is as good as you > seem to say it is, it should be obvious why his (and your prior) > concerns are unfounded. Up to now, you just stated that you did measure > something. I believe I ran my testsuite (see tarball from the other subthread) with and without locking for the thread cache. Signal torture obviously didn't finish without locking. The performance impact for the others was well in the noise. Maybe with a few hundred repetitions I could have extracted some non-noise signal, but I was happy with that result. Since my testsuite spends >50% of the time in malloc, any real-world results would get diluted even further. > I'd also like to kindly point out that we're not on LKML here. We all > will understand you better if you provide sounds arguments and provide > the necessary background information instead of if you "call bullshit". Fair. I have little patience for theoretical concerns that block real improvements based on zero evidence. But I shouldn't drop my manners because of that. Apologies. Jörn -- The cheapest, fastest and most reliable components of a computer system are those that aren't there. -- Gordon Bell, DEC labratories
On 01/27/2016 12:44 PM, Jörn Engel wrote: > On Wed, Jan 27, 2016 at 10:05:33AM +0000, Szabolcs Nagy wrote: >> >> portable code does not need the fix, but all users >> of the api are affected by the overhead of the fix. > > Sorry, but you have no idea what you are talking about. The overhead of > the fix is _negative_. Users _want_ to be affected. > > The reason why free() is not signalsafe is that it spins on an > arena-lock. If the same thread already holds that lock, you have a > classic deadlock. By not spinning on the lock you make code run faster. > You also fix the signal-induced deadlock. > > Ok, for thread-cache there is some overhead. Without signal-safety you > wouldn't need a lock for the thread-cache at all. But here I call > bullshit again, because I had the same concerns. Then I measured and > could not demonstrate any performance impact. > > "If you cannot measure it, it does not exist." > > You might disagree philosophically, but if an engineers goes out of his > way to measure a certain effect and finds nothing, that effect hardly > matters in any practical way. My mantra is slightly different: "If others cannot measure it, your performance gains don't exist." We are always looking for microbenchmark contributions that others can run objectively to evaluate any such changes. Cheers, Carlos.
diff --git a/tpc/malloc2.13/malloc-machine.h b/tpc/malloc2.13/malloc-machine.h index 07072f5d5e11..a03b78bf3c1a 100644 --- a/tpc/malloc2.13/malloc-machine.h +++ b/tpc/malloc2.13/malloc-machine.h @@ -65,6 +65,9 @@ static inline int mutex_lock(mutex_t *m) { } } } +/* Returns 0 on success, 1 on error + XXX: This is the opposite of the Linux kernel's mutex_trylock + primitive, making it confusing and error-prone. */ static inline int mutex_trylock(mutex_t *m) { int r; diff --git a/tpc/malloc2.13/tcache.h b/tpc/malloc2.13/tcache.h index e267ce905ed0..ece03fc464cd 100644 --- a/tpc/malloc2.13/tcache.h +++ b/tpc/malloc2.13/tcache.h @@ -90,6 +90,13 @@ static inline int get_bit(unsigned long *bitmap, int i) } struct thread_cache { + /* + * Since the cache is per-thread, it should not need a lock. + * The reason we have one anyway is signal handlers, which can + * cause the same thread to enter this code twice concurrently. + */ + mutex_t mutex; + /* Bytes in cache */ uint32_t tc_size; @@ -264,9 +271,13 @@ static void *tcache_malloc(size_t size) if (!cache) return NULL; memset(cache, 0, sizeof(*cache)); + mutex_init(&cache->mutex); tsd_setspecific(cache_key, cache); } + if (mutex_trylock(&cache->mutex)) + return NULL; + bin_no = cache_bin(nb); assert(bin_no < CACHE_NO_BINS); set_accessed(cache, bin_no); @@ -281,6 +292,7 @@ static void *tcache_malloc(size_t size) } if (cache_bin(chunksize(*bin)) != bin_no) { malloc_printerr(check_action, "invalid tcache entry", victim); + mutex_unlock(&cache->mutex); return NULL; } *bin = victim->fd; @@ -288,6 +300,7 @@ static void *tcache_malloc(size_t size) cache->tc_size -= chunksize(victim); cache->tc_count--; alloc_perturb(p, size); + mutex_unlock(&cache->mutex); return p; } @@ -301,8 +314,10 @@ static void *tcache_malloc(size_t size) tcache_gc(cache); arena = arena_get(size); - if (!arena) + if (!arena) { + mutex_unlock(&cache->mutex); return NULL; + } free_atomic_list(arena); /* TODO: _int_malloc does checked_request2size() again, which is silly */ victim = _int_malloc(arena, size); @@ -335,6 +350,7 @@ static void *tcache_malloc(size_t size) arena_unlock(arena); assert(!victim || arena == arena_for_chunk(mem2chunk(victim))); alloc_perturb(victim, size); + mutex_unlock(&cache->mutex); return victim; } @@ -346,12 +362,16 @@ static void tcache_free(mchunkptr p) size_t size; int bin_no; + size = chunksize(p); + if (size > MAX_CACHED_SIZE) + goto uncached_free; + tsd_getspecific(cache_key, cache); if (!cache) goto uncached_free; - size = chunksize(p); - if (size > MAX_CACHED_SIZE) + if (mutex_trylock(&cache->mutex)) goto uncached_free; + bin_no = cache_bin(size); assert(bin_no < CACHE_NO_BINS); @@ -360,16 +380,18 @@ static void tcache_free(mchunkptr p) bin = &cache->tc_bin[bin_no]; if (*bin == p) { malloc_printerr(check_action, "double free or corruption (tcache)", chunk2mem(p)); - return; + goto out; } if (*bin && cache_bin(chunksize(*bin)) != bin_no) { malloc_printerr(check_action, "invalid tcache entry", chunk2mem(p)); - return; + goto out; } free_perturb(p, size - 2 * SIZE_SZ); add_to_bin(bin, p); if (cache->tc_size > CACHE_SIZE) tcache_gc(cache); + out: + mutex_unlock(&cache->mutex); return; uncached_free: