Message ID | 87zgo73boe.fsf@oldenburg.str.redhat.com |
---|---|
State | New |
Headers | show |
Series | elf: Simplify software TM implementation in _dl_find_object | expand |
The 01/07/2022 14:41, Florian Weimer wrote: > With the current set of fences, the version update at the start > of the TM write operation is redundant. Also use relaxed MO stores > during the dlclose update, and skip any version changes there. > > Suggested-by: Szabolcs Nagy <szabolcs.nagy@arm.com> > > --- > Initial testing looks good, but I'll keep the aarch64 and powerpc64le > tests running in a loop for a while. looks good to me except > @@ -282,15 +279,6 @@ _dlfo_mappings_end_update (void) > atomic_thread_fence_release (); > __atomic_wide_counter_fetch_add_relaxed (&_dlfo_loaded_mappings_version, 1); > } i think the v += 1 does not have to be atomic so start_version = load_relaxed (&ver); store_relaxed (&ver, start_version + 1); or if we can change the api to pass start_version that may be clearer.
* Szabolcs Nagy: > The 01/07/2022 14:41, Florian Weimer wrote: >> With the current set of fences, the version update at the start >> of the TM write operation is redundant. Also use relaxed MO stores >> during the dlclose update, and skip any version changes there. >> >> Suggested-by: Szabolcs Nagy <szabolcs.nagy@arm.com> >> >> --- >> Initial testing looks good, but I'll keep the aarch64 and powerpc64le >> tests running in a loop for a while. > > looks good to me except > >> @@ -282,15 +279,6 @@ _dlfo_mappings_end_update (void) >> atomic_thread_fence_release (); >> __atomic_wide_counter_fetch_add_relaxed (&_dlfo_loaded_mappings_version, 1); >> } > > i think the v += 1 does not have to be atomic so > > start_version = load_relaxed (&ver); > store_relaxed (&ver, start_version + 1); > > or if we can change the api to pass start_version > that may be clearer. The 32-bit wide counter needs to be atomic. We can't use a straight store in that case. Should I break abstraction and write through for the 64-bit case? Thanks, Florian
The 01/07/2022 16:56, Florian Weimer wrote: > * Szabolcs Nagy: > > > The 01/07/2022 14:41, Florian Weimer wrote: > >> With the current set of fences, the version update at the start > >> of the TM write operation is redundant. Also use relaxed MO stores > >> during the dlclose update, and skip any version changes there. > >> > >> Suggested-by: Szabolcs Nagy <szabolcs.nagy@arm.com> > >> > >> --- > >> Initial testing looks good, but I'll keep the aarch64 and powerpc64le > >> tests running in a loop for a while. > > > > looks good to me except > > > >> @@ -282,15 +279,6 @@ _dlfo_mappings_end_update (void) > >> atomic_thread_fence_release (); > >> __atomic_wide_counter_fetch_add_relaxed (&_dlfo_loaded_mappings_version, 1); > >> } > > > > i think the v += 1 does not have to be atomic so > > > > start_version = load_relaxed (&ver); > > store_relaxed (&ver, start_version + 1); > > > > or if we can change the api to pass start_version > > that may be clearer. > > The 32-bit wide counter needs to be atomic. We can't use a straight > store in that case. > > Should I break abstraction and write through for the 64-bit case? i didn't think about that. but it seems there is a __atomic_wide_counter_add_relaxed which seems to do the right thing.
* Szabolcs Nagy: > The 01/07/2022 16:56, Florian Weimer wrote: >> * Szabolcs Nagy: >> >> > The 01/07/2022 14:41, Florian Weimer wrote: >> >> With the current set of fences, the version update at the start >> >> of the TM write operation is redundant. Also use relaxed MO stores >> >> during the dlclose update, and skip any version changes there. >> >> >> >> Suggested-by: Szabolcs Nagy <szabolcs.nagy@arm.com> >> >> >> >> --- >> >> Initial testing looks good, but I'll keep the aarch64 and powerpc64le >> >> tests running in a loop for a while. >> > >> > looks good to me except >> > >> >> @@ -282,15 +279,6 @@ _dlfo_mappings_end_update (void) >> >> atomic_thread_fence_release (); >> >> __atomic_wide_counter_fetch_add_relaxed (&_dlfo_loaded_mappings_version, 1); >> >> } >> > >> > i think the v += 1 does not have to be atomic so >> > >> > start_version = load_relaxed (&ver); >> > store_relaxed (&ver, start_version + 1); >> > >> > or if we can change the api to pass start_version >> > that may be clearer. >> >> The 32-bit wide counter needs to be atomic. We can't use a straight >> store in that case. >> >> Should I break abstraction and write through for the 64-bit case? > > i didn't think about that. > > but it seems there is a > > __atomic_wide_counter_add_relaxed > > which seems to do the right thing. Good point. I'll restart my tests with that. Thanks, Flroian
diff --git a/elf/dl-find_object.c b/elf/dl-find_object.c index 0eb8607edd..7b906c8f1d 100644 --- a/elf/dl-find_object.c +++ b/elf/dl-find_object.c @@ -123,9 +123,9 @@ struct dlfo_mappings_segment /* To achieve async-signal-safety, two copies of the data structure are used, so that a signal handler can still use this data even if - dlopen or dlclose modify the other copy. The the MSB in - _dlfo_loaded_mappings_version determines which array element is the - currently active region. */ + dlopen or dlclose modify the other copy. The the least significant + bit in _dlfo_loaded_mappings_version determines which array element + is the currently active region. */ static struct dlfo_mappings_segment *_dlfo_loaded_mappings[2]; /* Returns the number of actually used elements in all segments @@ -248,7 +248,7 @@ _dlfo_read_start_version (void) { /* Acquire MO load synchronizes with the fences at the beginning and end of the TM update region in _dlfo_mappings_begin_update, - _dlfo_mappings_end_update, _dlfo_mappings_end_update_no_switch. */ + _dlfo_mappings_end_update. */ return __atomic_wide_counter_load_acquire (&_dlfo_loaded_mappings_version); } @@ -261,16 +261,13 @@ _dlfo_read_version_locked (void) } /* Update the version to reflect that an update is happening. This - does not change the bit that controls the active segment chain. - Returns the index of the currently active segment chain. */ -static inline unsigned int + does not change the bit that controls the active segment chain. */ +static inline void _dlfo_mappings_begin_update (void) { - /* The store synchronizes with loads in _dlfo_read_start_version + /* The fence synchronizes with loads in _dlfo_read_start_version (also called from _dlfo_read_success). */ atomic_thread_fence_release (); - return __atomic_wide_counter_fetch_add_relaxed - (&_dlfo_loaded_mappings_version, 2); } /* Installs the just-updated version as the active version. */ @@ -282,15 +279,6 @@ _dlfo_mappings_end_update (void) atomic_thread_fence_release (); __atomic_wide_counter_fetch_add_relaxed (&_dlfo_loaded_mappings_version, 1); } -/* Completes an in-place update without switching versions. */ -static inline void -_dlfo_mappings_end_update_no_switch (void) -{ - /* The store synchronizes with loads in _dlfo_read_start_version - (also called from _dlfo_read_success). */ - atomic_thread_fence_release (); - __atomic_wide_counter_fetch_add_relaxed (&_dlfo_loaded_mappings_version, 2); -} /* Return true if the read was successful, given the start version. */ @@ -302,10 +290,11 @@ _dlfo_read_success (uint64_t start_version) the TM region are not ordered past the version check below. */ atomic_thread_fence_acquire (); - /* Synchronizes with stores in _dlfo_mappings_begin_update, - _dlfo_mappings_end_update, _dlfo_mappings_end_update_no_switch. - It is important that all stores from the last update have been - visible, and stores from the next updates are not. + /* Synchronizes with the fences in _dlfo_mappings_begin_update, + _dlfo_mappings_end_update. It is important that all stores from + the last update have become visible, and stores from the next + update to this version are not before the version number is + updated. Unlike with seqlocks, there is no check for odd versions here because we have read the unmodified copy (confirmed to be @@ -839,17 +828,10 @@ _dl_find_object_dlclose (struct link_map *map) issues around __libc_freeres. */ return; - /* The update happens in-place, but given that we do not use - atomic accesses on the read side, update the version around - the update to trigger re-validation in concurrent - readers. */ - _dlfo_mappings_begin_update (); - - /* Mark as closed. */ - obj->map_end = obj->map_start; - obj->map = NULL; - - _dlfo_mappings_end_update_no_switch (); + /* Mark as closed. This does not change the overall data + structure, so no TM cycle is needed. */ + atomic_store_relaxed (&obj->map_end, obj->map_start); + atomic_store_relaxed (&obj->map, NULL); return; } }