Skip to content

cas_lockref does not measure locked lockref #80

@jty2

Description

@jty2

It is described in #76 (comment) that https://github.com/ARM-software/synchronization-benchmarks/blob/master/benchmarks/lockhammer/tests/cas_lockref.h provides a simplified representation of Linux kernel lockrefs, which allows for opportunistic lockless increment/decrement of a reference count using compare-and-swap when the associated spinlock is not taken.

The lockhammer cas_lockref.h:lock_acquire() is roughly congruent to lockref_get(), and lock_release() is roughly congruent to lockref_put_return(). However, lockref_get() and lockref_put_return() do not create a critical section in the concept that lockhammer has for the other synchronization routines through the lock_acquire() and lock_release() API.

  • In lockhammer, generally only the winning thread leaves lock_acquire() to enter the critical section, spends the -c parameter amount of critical time, and then runs lock_release() to leave the critical section.
  • In the Linux kernel, lockref_get() and lockref_put_return() merely increment/decrement the reference count of an associated data structure, and do not provide exclusivity to it (which would be the purpose of the spinlock for the data structure).

However, lockhammer’s cas_lockref implementation has several defects that I think make it non-representative of the Linux lockref implementation:

  • In lockhammer, the lock is never taken (there is no slow path implemented that takes the lock to increment the refcount, i.e. lockref_get_or_lock()), so the (old & 0xFFFFFFFF) test in the while loop condition that tests if the lock has been taken is never true.
  • In lockhammer, there is an additional term in the while loop that checks if the refcount is between 1 and 32, but this never gets tested because there is no code to ever take the lock. Consequently, the loop body that increments/decrements the refcount is never calculated. It’s not clear to me why the while loop tests for the range of values 1-32.
  • In Linux, cmpxchg is used to update the lock_count while the spinlock is not taken, and if it is locked, it executes a slow path to take the lock to increment/decrement the refcount. However, in lockhammer, the lock_count is always just cas64()’d by all threads regardless of whether or not the lock is taken (which it is never taken per above).
  • In Linux, the while loop that polls on the lock to cmpxchg the lock_count uses cpu_relax() to pace the polling. However, in lockhammer, there is no pacing using cpu_relax().
    -- cpu_relax() in this loop was added in Linux v3.12 d472d9d98b463dd7a04f2bcdeafe4261686ce6ab and removed in Linux v6.2 f5fe24ef17b5fbe6db49534163e77499fb10ae8c, and this removal was also backported to v5.10.166 in 20a02bc845083abe5c7406caa9c6408ab0b2cc76

As such, I think the cas_lockref test as-is should just be identified as being cas64 (mapping to cmpxchg or CAS instructions) running repeatedly on all threads and not necessarily representative of Linux lockrefs. Some of this has to do with lockhammer's main metric being the measurement of a single operation's latency (in this case, increment/decrement of an unlocked refcount) and not that of different lock status. This test can probably be enhanced by having a parameter that has the lock to be initially taken, but then it may become dominated by spinlock acquisition overhead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions