diff mbox

[v6,04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks

Message ID 51226B46.9080707@linux.vnet.ibm.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Srivatsa S. Bhat Feb. 18, 2013, 5:56 p.m. UTC
On 02/18/2013 09:51 PM, Srivatsa S. Bhat wrote:
> Hi Michel,
> 
> On 02/18/2013 09:15 PM, Michel Lespinasse wrote:
>> Hi Srivasta,
>>
>> I admit not having followed in detail the threads about the previous
>> iteration, so some of my comments may have been discussed already
>> before - apologies if that is the case.
>>
>> On Mon, Feb 18, 2013 at 8:38 PM, Srivatsa S. Bhat
>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>> Reader-writer locks and per-cpu counters are recursive, so they can be
>>> used in a nested fashion in the reader-path, which makes per-CPU rwlocks also
>>> recursive. Also, this design of switching the synchronization scheme ensures
>>> that you can safely nest and use these locks in a very flexible manner.
[...]
>>>  void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
>>>  {
>>> +       unsigned int cpu;
>>> +
>>> +       /*
>>> +        * Tell all readers that a writer is becoming active, so that they
>>> +        * start switching over to the global rwlock.
>>> +        */
>>> +       for_each_possible_cpu(cpu)
>>> +               per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = true;
>>
>> I don't see anything preventing a race with the corresponding code in
>> percpu_write_unlock() that sets writer_signal back to false. Did I
>> miss something here ? It seems to me we don't have any guarantee that
>> all writer signals will be set to true at the end of the loop...
>>
> 
> Ah, thanks for pointing that out! IIRC Oleg had pointed this issue in the last
> version, but back then, I hadn't fully understood what he meant. Your
> explanation made it clear. I'll work on fixing this.
> 

We can fix this by using the simple patch (untested) shown below.
The alternative would be to acquire the rwlock for write, update the
->writer_signal values, release the lock, wait for readers to switch,
again acquire the rwlock for write with interrupts disabled etc... which
makes it kinda messy, IMHO. So I prefer the simple version shown below.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Michel Lespinasse Feb. 18, 2013, 6:07 p.m. UTC | #1
On Tue, Feb 19, 2013 at 1:56 AM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 02/18/2013 09:51 PM, Srivatsa S. Bhat wrote:
>> On 02/18/2013 09:15 PM, Michel Lespinasse wrote:
>>> I don't see anything preventing a race with the corresponding code in
>>> percpu_write_unlock() that sets writer_signal back to false. Did I
>>> miss something here ? It seems to me we don't have any guarantee that
>>> all writer signals will be set to true at the end of the loop...
>>
>> Ah, thanks for pointing that out! IIRC Oleg had pointed this issue in the last
>> version, but back then, I hadn't fully understood what he meant. Your
>> explanation made it clear. I'll work on fixing this.
>
> We can fix this by using the simple patch (untested) shown below.
> The alternative would be to acquire the rwlock for write, update the
> ->writer_signal values, release the lock, wait for readers to switch,
> again acquire the rwlock for write with interrupts disabled etc... which
> makes it kinda messy, IMHO. So I prefer the simple version shown below.

Looks good.

Another alternative would be to make writer_signal an atomic integer
instead of a bool. That way writers can increment it before locking
and decrement it while unlocking.

To reduce the number of atomic ops during writer lock/unlock, the
writer_signal could also be a global read_mostly variable (I don't see
any downsides to that compared to having it percpu - or is it because
you wanted all the fastpath state to be in one single cacheline ?)
Srivatsa S. Bhat Feb. 18, 2013, 6:14 p.m. UTC | #2
On 02/18/2013 11:37 PM, Michel Lespinasse wrote:
> On Tue, Feb 19, 2013 at 1:56 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 02/18/2013 09:51 PM, Srivatsa S. Bhat wrote:
>>> On 02/18/2013 09:15 PM, Michel Lespinasse wrote:
>>>> I don't see anything preventing a race with the corresponding code in
>>>> percpu_write_unlock() that sets writer_signal back to false. Did I
>>>> miss something here ? It seems to me we don't have any guarantee that
>>>> all writer signals will be set to true at the end of the loop...
>>>
>>> Ah, thanks for pointing that out! IIRC Oleg had pointed this issue in the last
>>> version, but back then, I hadn't fully understood what he meant. Your
>>> explanation made it clear. I'll work on fixing this.
>>
>> We can fix this by using the simple patch (untested) shown below.
>> The alternative would be to acquire the rwlock for write, update the
>> ->writer_signal values, release the lock, wait for readers to switch,
>> again acquire the rwlock for write with interrupts disabled etc... which
>> makes it kinda messy, IMHO. So I prefer the simple version shown below.
> 
> Looks good.
> 
> Another alternative would be to make writer_signal an atomic integer
> instead of a bool. That way writers can increment it before locking
> and decrement it while unlocking.
> 

Yep, that would also do. But the spinlock version looks simpler - no need
to check if the atomic counter is non-zero, no need to explicitly spin in
a tight-loop etc.

> To reduce the number of atomic ops during writer lock/unlock, the
> writer_signal could also be a global read_mostly variable (I don't see
> any downsides to that compared to having it percpu - or is it because
> you wanted all the fastpath state to be in one single cacheline ?)
> 

Yes, we (Oleg and I) debated for a while about global vs percpu, and then
finally decided to go with percpu to have cache benefits.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
index bf95e40..64ccd3f 100644
--- a/lib/percpu-rwlock.c
+++ b/lib/percpu-rwlock.c
@@ -50,6 +50,12 @@ 
 	(__this_cpu_read((pcpu_rwlock)->rw_state->writer_signal))
 
 
+/*
+ * Spinlock to synchronize access to the writer's data-structures
+ * (->writer_signal) from multiple writers.
+ */
+static DEFINE_SPINLOCK(writer_side_lock);
+
 int __percpu_init_rwlock(struct percpu_rwlock *pcpu_rwlock,
 			 const char *name, struct lock_class_key *rwlock_key)
 {
@@ -191,6 +197,8 @@  void percpu_write_lock_irqsave(struct percpu_rwlock *pcpu_rwlock,
 {
 	unsigned int cpu;
 
+	spin_lock(&writer_side_lock);
+
 	/*
 	 * Tell all readers that a writer is becoming active, so that they
 	 * start switching over to the global rwlock.
@@ -252,5 +260,6 @@  void percpu_write_unlock_irqrestore(struct percpu_rwlock *pcpu_rwlock,
 		per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = false;
 
 	write_unlock_irqrestore(&pcpu_rwlock->global_rwlock, *flags);
+	spin_unlock(&writer_side_lock);
 }