[Libatomic,Darwin] Initial libatomic port for darwin.

Hello Richard, Joseph,

Thanks for your reviews,

On 13 Nov 2014, at 07:40, Richard Henderson wrote:

> On 11/12/2014 10:18 PM, Iain Sandoe wrote:
> 
>> #  ifndef USE_ATOMIC
>> #    define USE_ATOMIC 1
>> #  endif
> 
> Why would USE_ATOMIC be defined previously?

This was left-over from a mode where I allowed the User to jam the mode to OSSpinLocks to test the performance.

I apologise, [weak excuse follows] with the turbulence of Darwin on trunk, my trunk version of the patch had got behind my 4.9 one. (most of the work has been hammered out there while we try to get bootstrap restored).

re-synced and retested with a patched trunk that bootstraps with some band-aid.

>> inline static void LockUnlock(uint32_t *l) {
>>   __atomic_store_4((_Atomic(uint32_t)*)l, 0, __ATOMIC_RELEASE);
>> }
> 
> Gnu coding style, please.  All through the file here.

Fixed.

> #  define LOCK_SIZE sizeof(uint32_t)
>> #  define NLOCKS (PAGE_SIZE / LOCK_SIZE)
>> static uint32_t locks[NLOCKS];
> 
> Um, surely not LOCK_SIZE, but CACHELINE_SIZE.  It's the granularity of the
> target region that's at issue, not the size of the lock itself.

The algorithm I've used is intentionally different from the pthreads-based posix one, here's the rationale, as I see it:

/* Algorithm motivations.

Layout Assumptions:
  o Darwin has a number of sub-targets with common atomic types that have no
    'native' in-line handling, but are smaller than a cache-line.
    E.G. PPC32 needs locking for >= 8byte quantities, X86/m32 for >=16.

  o The _Atomic alignment of a "natural type" is no greater than the type size.

  o There are no special guarantees about the alignment of _Atomic aggregates
    other than those determined by the psABI.

  o There are no guarantees that placement of an entity won't cause it to
    straddle a cache-line boundary.

  o Realistic User code will likely place several _Atomic qualified types in
    close proximity (such that they fall within the same cache-line).
    Similarly, arrays of _Atomic qualified items.

Performance Assumptions:
  o Collisions of address hashes for items (which make up the lock keys)
    constitute the largest performance issue.

  o We want to avoid unnecessary flushing of [lock table] cache-line(s) when
    items are accessed.

Implementation:
  We maintain a table of locks, each lock being 4 bytes (at present). 
  This choice of lock size gives some measure of equality in hash-collision
  statistics between the 'atomic' and 'OSSpinLock' implementations, since the
  lock size is fixed at 4 bytes for the latter.

  The table occupies one physical page, and we attempt to align it to a
  page boundary, appropriately.

  For entities that need a lock, with sizes < one cache line:
    Each entity that requires a lock, chooses the lock to use from the table on
  the basis of a hash determined by its size and address.  The lower log2(size)
  address bits are discarded on the assumption that the alignment of entities
  will not be smaller than their size.
  CHECKME: this is not verified for aggregates; it might be something that
  could/should be enforced from the front ends (since _Atomic types are
  allowed to have increased alignment c.f. 'normal').

  For entities that need a lock, with sizes >= one cacheline_size:
    We assume that the entity alignment >= log2(cacheline_size) and discard
  log2(cacheline_size) bits from the address.
  We then apply size/cacheline_size locks to cover the entity.

  The idea is that this will typically result in distinct hash keys for items
  placed close together.  The keys are mangled further such that the size is
  included in the hash.

  Finally, to attempt to make it such that the lock table entries are accessed
  in a scattered manner,to avoid repeated cacheline flushes, the hash is
  rearranged to attempt to maximise the most noise in the upper bits.
*/

NOTE that the CHECKME above doesn't put us in any worse position than the pthreads implementation (likely slightly better since we have a smaller granularity with the current scheme).

>> #if USE_ATOMIC
>>   LockLock (&locks[addr_hash (ptr, 1)]);
>> #else
>>   OSSpinLockLock(&locks[addr_hash (ptr, 1)]);
>> #endif
> 
> Better to #define LockLock  OSSpinLockLock within the area above, so as to
> avoid the ifdefs here.
done.

Thoughts on the rationale - or OK now?

thanks
Iain

I'm not aware of any other PRs that relate, but will do a final scan through and ask around the darwin folks.

libatomic:

	PR target/59305
	* config/darwin/host-config.h New.
	* config/darwin/lock.c New.
	* configure.tgt (DEFAULT_X86_CPU): New, (target): New entry for darwin.

[Libatomic,Darwin] Initial libatomic port for darwin.

Commit Message

Comments

Patch

[Libatomic,Darwin] Initial libatomic port for *darwin*.

Commit Message

Comments

Patch

[Libatomic,Darwin] Initial libatomic port for darwin.