diff mbox

malloc: add locking to thread cache

Message ID 1453767942-19369-52-git-send-email-joern@purestorage.com
State New
Headers show

Commit Message

Jörn Engel Jan. 26, 2016, 12:25 a.m. UTC
With signals we can reenter the thread-cache.  Protect against that with
a lock.  Will almost never happen in practice, it took the company five
years to reproduce a similar race in the existing malloc.  But easy to
trigger with a targeted test.

JIRA: PURE-27597
---
 tpc/malloc2.13/malloc-machine.h |  3 +++
 tpc/malloc2.13/tcache.h         | 32 +++++++++++++++++++++++++++-----
 2 files changed, 30 insertions(+), 5 deletions(-)

Comments

Szabolcs Nagy Jan. 26, 2016, 12:44 p.m. UTC | #1
On 26/01/16 00:25, Joern Engel wrote:
> With signals we can reenter the thread-cache.  Protect against that with
> a lock.  Will almost never happen in practice, it took the company five
> years to reproduce a similar race in the existing malloc.  But easy to
> trigger with a targeted test.

why do you try to make malloc as-safe?

isn't it better to fix malloc usage in signal handlers?
Yury Gribov Jan. 26, 2016, 1:14 p.m. UTC | #2
On 01/26/2016 03:44 PM, Szabolcs Nagy wrote:
> On 26/01/16 00:25, Joern Engel wrote:
>> With signals we can reenter the thread-cache.  Protect against that with
>> a lock.  Will almost never happen in practice, it took the company five
>> years to reproduce a similar race in the existing malloc.  But easy to
>> trigger with a targeted test.
>
> why do you try to make malloc as-safe?
>
> isn't it better to fix malloc usage in signal handlers?

FYI I once tried to check OSS for violations of signal handler 
requirements (see https://github.com/yugr/sigcheck).  It turned out that 
many packages (ab)use malloc in sighandlers (directly or via printf et 
al.).  Fixing (or even finding) all badly behaving software out there 
would be a huge amount of work so "fixing" it on the library side 
instead may make sense.

-Y
Florian Weimer Jan. 26, 2016, 1:14 p.m. UTC | #3
On 01/26/2016 01:44 PM, Szabolcs Nagy wrote:
> On 26/01/16 00:25, Joern Engel wrote:
>> With signals we can reenter the thread-cache.  Protect against that with
>> a lock.  Will almost never happen in practice, it took the company five
>> years to reproduce a similar race in the existing malloc.  But easy to
>> trigger with a targeted test.
> 
> why do you try to make malloc as-safe?
> 
> isn't it better to fix malloc usage in signal handlers?

We have functionality like dprintf, syslog, backtrace, C++ thread-local
object access which might be used from signal handlers, but which can
call malloc.  Fixing this in malloc may seem attractive, but I doubt it
can be made completely reliable by concentrating changes there.

Florian
Yury Gribov Jan. 26, 2016, 1:23 p.m. UTC | #4
On 01/26/2016 04:14 PM, Florian Weimer wrote:
> On 01/26/2016 01:44 PM, Szabolcs Nagy wrote:
>> On 26/01/16 00:25, Joern Engel wrote:
>>> With signals we can reenter the thread-cache.  Protect against that with
>>> a lock.  Will almost never happen in practice, it took the company five
>>> years to reproduce a similar race in the existing malloc.  But easy to
>>> trigger with a targeted test.
>>
>> why do you try to make malloc as-safe?
>>
>> isn't it better to fix malloc usage in signal handlers?
>
> We have functionality like dprintf, syslog, backtrace, C++ thread-local
> object access which might be used from signal handlers, but which can
> call malloc.

Right.  One can argue that this is an error and someone should go fix 
the code but in my experience most real-world signal handlers have these 
problems and noone is going to rewrite them any time soon.

> Fixing this in malloc may seem attractive, but I doubt it
> can be made completely reliable by concentrating changes there.
>
> Florian
>
>
>
Szabolcs Nagy Jan. 26, 2016, 1:40 p.m. UTC | #5
On 26/01/16 13:23, Yury Gribov wrote:
> On 01/26/2016 04:14 PM, Florian Weimer wrote:
>> On 01/26/2016 01:44 PM, Szabolcs Nagy wrote:
>>> On 26/01/16 00:25, Joern Engel wrote:
>>>> With signals we can reenter the thread-cache.  Protect against that with
>>>> a lock.  Will almost never happen in practice, it took the company five
>>>> years to reproduce a similar race in the existing malloc.  But easy to
>>>> trigger with a targeted test.
>>>
>>> why do you try to make malloc as-safe?
>>>
>>> isn't it better to fix malloc usage in signal handlers?
>>
>> We have functionality like dprintf, syslog, backtrace, C++ thread-local
>> object access which might be used from signal handlers, but which can
>> call malloc.
> 
> Right.  One can argue that this is an error and someone should go fix the code but in my experience most
> real-world signal handlers have these problems and noone is going to rewrite them any time soon.
> 

it is easy to write non-conforming code, but that
does not mean the libc should try to make it work.

in this case signals should be blocked whenever the
library is entered through non-as-safe api calls.
that has a significant cost for portable code.

>> Fixing this in malloc may seem attractive, but I doubt it
>> can be made completely reliable by concentrating changes there.
>>
>> Florian
>>
>>
>>
>
Jörn Engel Jan. 26, 2016, 5:40 p.m. UTC | #6
On Tue, Jan 26, 2016 at 12:44:57PM +0000, Szabolcs Nagy wrote:
> On 26/01/16 00:25, Joern Engel wrote:
> > With signals we can reenter the thread-cache.  Protect against that with
> > a lock.  Will almost never happen in practice, it took the company five
> > years to reproduce a similar race in the existing malloc.  But easy to
> > trigger with a targeted test.
> 
> why do you try to make malloc as-safe?

Why would I not?  I find it easier to make malloc signal-safe than to
find an alternative fix for the signal handler of a single application.
If nothing else, the cost of entering the signal handler a million times
is much lower for a malloc stresstest than an application.

> isn't it better to fix malloc usage in signal handlers?

If your goal is portability across many different malloc libraries, I
would agree.  If your goal is to fix deadlocks in applications, changing
malloc is a more economical use of your time.

Jörn

--
Functionality is an asset, but code is a liability.
--Ted Dziuba
Mike Frysinger Jan. 26, 2016, 6 p.m. UTC | #7
On 26 Jan 2016 13:40, Szabolcs Nagy wrote:
> On 26/01/16 13:23, Yury Gribov wrote:
> > On 01/26/2016 04:14 PM, Florian Weimer wrote:
> >> On 01/26/2016 01:44 PM, Szabolcs Nagy wrote:
> >>> On 26/01/16 00:25, Joern Engel wrote:
> >>>> With signals we can reenter the thread-cache.  Protect against that with
> >>>> a lock.  Will almost never happen in practice, it took the company five
> >>>> years to reproduce a similar race in the existing malloc.  But easy to
> >>>> trigger with a targeted test.
> >>>
> >>> why do you try to make malloc as-safe?
> >>>
> >>> isn't it better to fix malloc usage in signal handlers?
> >>
> >> We have functionality like dprintf, syslog, backtrace, C++ thread-local
> >> object access which might be used from signal handlers, but which can
> >> call malloc.
> > 
> > Right.  One can argue that this is an error and someone should go fix the code but in my experience most
> > real-world signal handlers have these problems and noone is going to rewrite them any time soon.
> > 
> 
> it is easy to write non-conforming code, but that
> does not mean the libc should try to make it work.

and that has a real cost for real time code, as well as adding real
overhead to hot paths.  we spend so much time on trying to get lock
less code, just to re-add overhead by calling into the kernel ?

> in this case signals should be blocked whenever the
> library is entered through non-as-safe api calls.
> that has a significant cost for portable code.

err, this statement makes no sense.  if you were writing portable
code, you wouldn't be using malloc or other non-as safe code in
the signal handler in the first place.
-mike
Jörn Engel Jan. 27, 2016, 5:44 p.m. UTC | #8
On Wed, Jan 27, 2016 at 10:05:33AM +0000, Szabolcs Nagy wrote:
> 
> portable code does not need the fix, but all users
> of the api are affected by the overhead of the fix.

Sorry, but you have no idea what you are talking about.  The overhead of
the fix is _negative_.  Users _want_ to be affected.

The reason why free() is not signalsafe is that it spins on an
arena-lock.  If the same thread already holds that lock, you have a
classic deadlock.  By not spinning on the lock you make code run faster.
You also fix the signal-induced deadlock.

Ok, for thread-cache there is some overhead.  Without signal-safety you
wouldn't need a lock for the thread-cache at all.  But here I call
bullshit again, because I had the same concerns.  Then I measured and
could not demonstrate any performance impact.

"If you cannot measure it, it does not exist."

You might disagree philosophically, but if an engineers goes out of his
way to measure a certain effect and finds nothing, that effect hardly
matters in any practical way.

Jörn

--
Only children believe that ideas are simply born.  Usually two other
ideas have sex.
-- Jon Gnarr
Torvald Riegel Jan. 27, 2016, 7:19 p.m. UTC | #9
On Wed, 2016-01-27 at 09:44 -0800, Jörn Engel wrote:
> On Wed, Jan 27, 2016 at 10:05:33AM +0000, Szabolcs Nagy wrote:
> > 
> > portable code does not need the fix, but all users
> > of the api are affected by the overhead of the fix.
> 
> Sorry, but you have no idea what you are talking about.  The overhead of
> the fix is _negative_.  Users _want_ to be affected.
> 
> The reason why free() is not signalsafe is that it spins on an
> arena-lock.  If the same thread already holds that lock, you have a
> classic deadlock.  By not spinning on the lock you make code run faster.
> You also fix the signal-induced deadlock.
> 
> Ok, for thread-cache there is some overhead.  Without signal-safety you
> wouldn't need a lock for the thread-cache at all.  But here I call
> bullshit again, because I had the same concerns.  Then I measured and
> could not demonstrate any performance impact.
> 
> "If you cannot measure it, it does not exist."
> 
> You might disagree philosophically, but if an engineers goes out of his
> way to measure a certain effect and finds nothing, that effect hardly
> matters in any practical way.

Then it should be sufficient if you can describe what you actually
measured to convince Szabolcs; if your measurement is as good as you
seem to say it is, it should be obvious why his (and your prior)
concerns are unfounded.  Up to now, you just stated that you did measure
something.

I'd also like to kindly point out that we're not on LKML here.  We all
will understand you better if you provide sounds arguments and provide
the necessary background information instead of if you "call bullshit".
Jörn Engel Jan. 27, 2016, 7:42 p.m. UTC | #10
On Wed, Jan 27, 2016 at 08:19:02PM +0100, Torvald Riegel wrote:
> 
> Then it should be sufficient if you can describe what you actually
> measured to convince Szabolcs; if your measurement is as good as you
> seem to say it is, it should be obvious why his (and your prior)
> concerns are unfounded.  Up to now, you just stated that you did measure
> something.

I believe I ran my testsuite (see tarball from the other subthread) with
and without locking for the thread cache.  Signal torture obviously
didn't finish without locking.  The performance impact for the others
was well in the noise.  Maybe with a few hundred repetitions I could
have extracted some non-noise signal, but I was happy with that result.

Since my testsuite spends >50% of the time in malloc, any real-world
results would get diluted even further.

> I'd also like to kindly point out that we're not on LKML here.  We all
> will understand you better if you provide sounds arguments and provide
> the necessary background information instead of if you "call bullshit".

Fair.

I have little patience for theoretical concerns that block real
improvements based on zero evidence.  But I shouldn't drop my manners
because of that.  Apologies.

Jörn

--
The cheapest, fastest and most reliable components of a computer
system are those that aren't there.
-- Gordon Bell, DEC labratories
Carlos O'Donell Jan. 27, 2016, 9:36 p.m. UTC | #11
On 01/27/2016 12:44 PM, Jörn Engel wrote:
> On Wed, Jan 27, 2016 at 10:05:33AM +0000, Szabolcs Nagy wrote:
>>
>> portable code does not need the fix, but all users
>> of the api are affected by the overhead of the fix.
> 
> Sorry, but you have no idea what you are talking about.  The overhead of
> the fix is _negative_.  Users _want_ to be affected.
> 
> The reason why free() is not signalsafe is that it spins on an
> arena-lock.  If the same thread already holds that lock, you have a
> classic deadlock.  By not spinning on the lock you make code run faster.
> You also fix the signal-induced deadlock.
> 
> Ok, for thread-cache there is some overhead.  Without signal-safety you
> wouldn't need a lock for the thread-cache at all.  But here I call
> bullshit again, because I had the same concerns.  Then I measured and
> could not demonstrate any performance impact.
> 
> "If you cannot measure it, it does not exist."
> 
> You might disagree philosophically, but if an engineers goes out of his
> way to measure a certain effect and finds nothing, that effect hardly
> matters in any practical way.

My mantra is slightly different:

"If others cannot measure it, your performance gains don't exist."

We are always looking for microbenchmark contributions that others
can run objectively to evaluate any such changes.

Cheers,
Carlos.
diff mbox

Patch

diff --git a/tpc/malloc2.13/malloc-machine.h b/tpc/malloc2.13/malloc-machine.h
index 07072f5d5e11..a03b78bf3c1a 100644
--- a/tpc/malloc2.13/malloc-machine.h
+++ b/tpc/malloc2.13/malloc-machine.h
@@ -65,6 +65,9 @@  static inline int mutex_lock(mutex_t *m) {
     }
   }
 }
+/* Returns 0 on success, 1 on error
+   XXX: This is the opposite of the Linux kernel's mutex_trylock
+   primitive, making it confusing and error-prone. */
 static inline int mutex_trylock(mutex_t *m) {
   int r;
 
diff --git a/tpc/malloc2.13/tcache.h b/tpc/malloc2.13/tcache.h
index e267ce905ed0..ece03fc464cd 100644
--- a/tpc/malloc2.13/tcache.h
+++ b/tpc/malloc2.13/tcache.h
@@ -90,6 +90,13 @@  static inline int get_bit(unsigned long *bitmap, int i)
 }
 
 struct thread_cache {
+	/*
+	 * Since the cache is per-thread, it should not need a lock.
+	 * The reason we have one anyway is signal handlers, which can
+	 * cause the same thread to enter this code twice concurrently.
+	 */
+	mutex_t mutex;
+
 	/* Bytes in cache */
 	uint32_t tc_size;
 
@@ -264,9 +271,13 @@  static void *tcache_malloc(size_t size)
 		if (!cache)
 			return NULL;
 		memset(cache, 0, sizeof(*cache));
+		mutex_init(&cache->mutex);
 		tsd_setspecific(cache_key, cache);
 	}
 
+	if (mutex_trylock(&cache->mutex))
+		return NULL;
+
 	bin_no = cache_bin(nb);
 	assert(bin_no < CACHE_NO_BINS);
 	set_accessed(cache, bin_no);
@@ -281,6 +292,7 @@  static void *tcache_malloc(size_t size)
 		}
 		if (cache_bin(chunksize(*bin)) != bin_no) {
 			malloc_printerr(check_action, "invalid tcache entry", victim);
+			mutex_unlock(&cache->mutex);
 			return NULL;
 		}
 		*bin = victim->fd;
@@ -288,6 +300,7 @@  static void *tcache_malloc(size_t size)
 		cache->tc_size -= chunksize(victim);
 		cache->tc_count--;
 		alloc_perturb(p, size);
+		mutex_unlock(&cache->mutex);
 		return p;
 	}
 
@@ -301,8 +314,10 @@  static void *tcache_malloc(size_t size)
 		tcache_gc(cache);
 
 	arena = arena_get(size);
-	if (!arena)
+	if (!arena) {
+		mutex_unlock(&cache->mutex);
 		return NULL;
+	}
 	free_atomic_list(arena);
 	/* TODO: _int_malloc does checked_request2size() again, which is silly */
 	victim = _int_malloc(arena, size);
@@ -335,6 +350,7 @@  static void *tcache_malloc(size_t size)
 	arena_unlock(arena);
 	assert(!victim || arena == arena_for_chunk(mem2chunk(victim)));
 	alloc_perturb(victim, size);
+	mutex_unlock(&cache->mutex);
 	return victim;
 }
 
@@ -346,12 +362,16 @@  static void tcache_free(mchunkptr p)
 	size_t size;
 	int bin_no;
 
+	size = chunksize(p);
+	if (size > MAX_CACHED_SIZE)
+		goto uncached_free;
+
 	tsd_getspecific(cache_key, cache);
 	if (!cache)
 		goto uncached_free;
-	size = chunksize(p);
-	if (size > MAX_CACHED_SIZE)
+	if (mutex_trylock(&cache->mutex))
 		goto uncached_free;
+
 	bin_no = cache_bin(size);
 	assert(bin_no < CACHE_NO_BINS);
 
@@ -360,16 +380,18 @@  static void tcache_free(mchunkptr p)
 	bin = &cache->tc_bin[bin_no];
 	if (*bin == p) {
 		malloc_printerr(check_action, "double free or corruption (tcache)", chunk2mem(p));
-		return;
+		goto out;
 	}
 	if (*bin && cache_bin(chunksize(*bin)) != bin_no) {
 		malloc_printerr(check_action, "invalid tcache entry", chunk2mem(p));
-		return;
+		goto out;
 	}
 	free_perturb(p, size - 2 * SIZE_SZ);
 	add_to_bin(bin, p);
 	if (cache->tc_size > CACHE_SIZE)
 		tcache_gc(cache);
+ out:
+	mutex_unlock(&cache->mutex);
 	return;
 
  uncached_free: