[1/1] dl-load: add memory barrier before updating the next.

Submitted by Maninder Singh on March 16, 2017, 5:12 a.m.

Details

Message ID 1489641122-35462-1-git-send-email-maninder1.s@samsung.com
State New
Headers show

Commit Message

Maninder Singh March 16, 2017, 5:12 a.m.
This patch adds memory barrier before updating the liblist next.

Issue Fix: race condition between add_name_to_object  & _dl_name_match_p.
One threads calling dlopen which further calls add_name_to_object &
other thread trying to resolve RTLD_LAZY symbols through _dl_runtime_resolve
which further calls.

_dl_name_match_p checks if libname->next is valid, then it assumes
libname->next->name to be valid. Also add_name_to_object initialized name
first and then sets valid next pointer.

This patch avoids any reorder of instruction when next is set before name
to avoid any race.

Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Vaneet Narang <v.narang@samsung.com>
---
 elf/dl-load.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Florian Weimer March 16, 2017, 8:20 a.m.
On 03/16/2017 06:12 AM, Maninder Singh wrote:
> This patch adds memory barrier before updating the liblist next.
>
> Issue Fix: race condition between add_name_to_object  & _dl_name_match_p.
> One threads calling dlopen which further calls add_name_to_object &
> other thread trying to resolve RTLD_LAZY symbols through _dl_runtime_resolve
> which further calls.
>
> _dl_name_match_p checks if libname->next is valid, then it assumes
> libname->next->name to be valid. Also add_name_to_object initialized name
> first and then sets valid next pointer.
>
> This patch avoids any reorder of instruction when next is set before name
> to avoid any race.

Would you please file a bug for this?  We'd also want to write a test 
case for this.

Unfortunately, I cannot comment on the substance of the patch.

Thanks,
Florian
Torvald Riegel March 17, 2017, 2:35 p.m.
On Thu, 2017-03-16 at 10:42 +0530, Maninder Singh wrote:
> This patch adds memory barrier before updating the liblist next.
> 
> Issue Fix: race condition between add_name_to_object  & _dl_name_match_p.
> One threads calling dlopen which further calls add_name_to_object &
> other thread trying to resolve RTLD_LAZY symbols through _dl_runtime_resolve
> which further calls.
> 
> _dl_name_match_p checks if libname->next is valid, then it assumes
> libname->next->name to be valid. Also add_name_to_object initialized name
> first and then sets valid next pointer.
> 
> This patch avoids any reorder of instruction when next is set before name
> to avoid any race.

Thanks for trying to fix concurrency issues.  I don't have the details
of synchronization in the loader paged in, so I'll just reply quickly
for now with a few general comments.

If you haven't yet read it, please have a look at
https://sourceware.org/glibc/wiki/Concurrency

If we fix concurrent code, we generally want to do so by transforming
the code to using C11 atomics (if necessary) and being valid in the C11
memory model.  For example, we want to make sure that it is
data-race-free as defined by C11 (e.g., with your patch and as you
describe the concurrent exceutions that are possible, there would need
to be a data-race involving the store to lastp->next).

If you add something like a C11 release action (ie, the write barrier
you add), then it must be paired with an acquire action somewhere else
to take effect.

We want to document concurrent code better.  In particular, we want to
say which acquire operation a release operation is supposed to
synchronize with, and why we need the happens-before this results in.  

Also, its often useful to say what the "input" concurrency is.  For
example, in this case, which concurrent accesses to the link_map data
structure are possible in a valid program?

If you have further questions about using the C11-like atomics we have,
please ask.  Looking at other concurrent code, for example
nptl/pthread_once.c, can also be helpful to get an idea of what we're
looking for.
Szabolcs Nagy March 17, 2017, 5:18 p.m.
On 16/03/17 05:12, Maninder Singh wrote:
> This patch adds memory barrier before updating the liblist next.
> 
> Issue Fix: race condition between add_name_to_object  & _dl_name_match_p.
> One threads calling dlopen which further calls add_name_to_object &
> other thread trying to resolve RTLD_LAZY symbols through _dl_runtime_resolve
> which further calls.
> 

i assume _dl_name_match_p is called from do_lookup_x which
is called from _dl_fixup at lazy resolution.

> _dl_name_match_p checks if libname->next is valid, then it assumes
> libname->next->name to be valid. Also add_name_to_object initialized name
> first and then sets valid next pointer.
> 
> This patch avoids any reorder of instruction when next is set before name
> to avoid any race.
> 
> Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
> Signed-off-by: Vaneet Narang <v.narang@samsung.com>
> ---
>  elf/dl-load.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/elf/dl-load.c b/elf/dl-load.c
> index 016a99c..c1e69b8 100644
> --- a/elf/dl-load.c
> +++ b/elf/dl-load.c
> @@ -429,6 +429,7 @@ add_name_to_object (struct link_map *l, const char *name)
>    newname->name = memcpy (newname + 1, name, name_len);
>    newname->next = NULL;
>    newname->dont_free = 0;
> +  atomic_write_barrier ();
>    lastp->next = newname;
>  }

ideally lastp->next = newname should be a mo_release
store and wherever map->next is read should be a
mo_acquire load or mo_consume load (if it is followed
by a map->next->something load).

mo_consume is frowned upon because of some
specification and implementation issues so use
mo_acquire.

only adding a barrier on the write side is not
correct (it happens to work on most architectures
because the compiled code of the read side will
have dependent loads that are guaranteed to be
ordered even on some weakly ordered architectures
like arm and aarch64, but on the c language
level there is no such guarantee so the compiler
may break this)
Torvald Riegel March 18, 2017, 6:22 p.m.
On Sat, 2017-03-18 at 09:12 +0000, Vaneet Narang wrote:
> >only adding a barrier on the write side is not
> 
> >correct 
> 
> 
> Reason for adding barrier only in writer path is because reader path has conditional
> check. Since instructions has dependency so Instruction reordering is not possible.
> 
> Writer Code:  add_name_to_object()   |           Reader Code: _dl_name_match_p()
>                                      |
> newname->name = memcpy( ...)         |        struct libname_list *runp = map->l_libname;
> newname->next = NULL;                |        while (runp != NULL)
> newname->dont_free = 0;              |           if (strcmp (name, runp->name) == 0)
> lastp->next = newname;               |             return 1;  
>                                      |          else  
>                                      |             runp = runp->next;

While that may be true for a particular HW memory model, this is not
guaranteed on the level of C code.  (It's unlikely that a compiler would
transform the code so that this wouldn't work in practice, and some
software such as the Linux kernel assumes that this won't ever happen.
Nonetheless, for valid C11 code, you must avoid the data race, and you
must tell the compiler that you need this particular ordering.)

As Szabolcs has mentioned, atomic loads with memory_order_consume as
memory order are intended to be the mechanism that allows a program to
state that it depends on such dependencies; however, there are
specification bugs in how memory_order_consume is currently specified.
Therefore, we'd just use an memory_order_acquire load instead to get the
value for runp in the code above.

> In Reader code if runp is not NULL only then it reads runp->name So i don't think instruction reordering 
> with conditional case is possible so barrier is required.  
> If there is any instruction reordering happens in writer side as lastp->next updated before newname->name 
> then we can face issue in reader side. So this is the reason why i added barrier only in writer side.
> 
> We have been facing issue in reader side where we get runp->next as valid but runp->next->name is NULL
> which results in NULL pointer access in strcmp but when we check coredump then we see runp->next->name
> so we are suspecting race condition.
> I don't see any issue with code as name is updated before next so only reason we can suspect for race 
> is instruction reordering at writer side.

Generally, in glibc we want to reason about concurrent code based on the
C11 memory model.  This ensures that the code is portable across
architectures.
Szabolcs Nagy March 20, 2017, 11:55 a.m.
On 18/03/17 09:12, Vaneet Narang wrote:
> Reason for adding barrier only in writer path is because reader path has conditional
> check. Since instructions has dependency so Instruction reordering is not possible.
> 
> Writer Code:  add_name_to_object()   |           Reader Code: _dl_name_match_p()
>                                      |
> newname->name = memcpy( ...)         |        struct libname_list *runp = map->l_libname;
> newname->next = NULL;                |        while (runp != NULL)
> newname->dont_free = 0;              |           if (strcmp (name, runp->name) == 0)
> lastp->next = newname;               |             return 1;  
>                                      |          else  
>                                      |             runp = runp->next;
> 
>   
...
> We have been facing issue in reader side where we get runp->next as valid but runp->next->name is NULL
> which results in NULL pointer access in strcmp but when we check coredump then we see runp->next->name
> so we are suspecting race condition.
> I don't see any issue with code as name is updated before next so only reason we can suspect for race 
> is instruction reordering at writer side.
> 
...
> We are facing issue on ARM based target. I have verified there
> is no compile time reordering, only reason which i can suspect now is
> runtime reordering because of cache miss on first store instruction. 
> So next store instruction gets executed first. 
> 

this is a data race in the c memory model (runp->next is
read and written concurrently without synchronization),
so the behaviour is undefined.

on arm, a write side barrier or release atomic store
should work in practice (but it's still incorrect e.g.
the compiled code can theoretically read runp->next->name
speculatively before runp->next by guessing the likely
value of runp->next and check the guess later), however
even if it was correct on arm, an arm only solution won't
be accepted in generic c code: the fix has to be correct
in the c memory model as Torvald already noted, see

https://sourceware.org/glibc/wiki/Concurrency

> Please suggest, if there is any other reason possible for issue i explained.
> Also please suggest if synchronization has already been taken care 
> between following threads executing loader code.
> 
>    Thread 1: dlopen  ---->  add_name_to_object
>    Thread 2: _dl_runtime_resolve  ---> _dl_name_match_p
> 

i think the write side needs atomic_store_release
and the read side an atomic_load_acquire and ideally
it should be checked where else this list is accessed
and document the synchronization in comments.

you should at least file a bug report against the
glibc dynamic-linker and in case you provide a patch
reference the bug number.

Patch hide | download patch | download mbox

diff --git a/elf/dl-load.c b/elf/dl-load.c
index 016a99c..c1e69b8 100644
--- a/elf/dl-load.c
+++ b/elf/dl-load.c
@@ -429,6 +429,7 @@  add_name_to_object (struct link_map *l, const char *name)
   newname->name = memcpy (newname + 1, name, name_len);
   newname->next = NULL;
   newname->dont_free = 0;
+  atomic_write_barrier ();
   lastp->next = newname;
 }