diff mbox

[v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

Message ID 1369699930.3301.494.camel@edumazet-glaptop
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet May 28, 2013, 12:12 a.m. UTC
On Mon, 2013-05-27 at 21:55 +0400, Roman Gushchin wrote:
> Hi, Paul!
> 
> > On 25.05.2013 15:37, Paul E. McKenney wrote:
> >> Again, I believe that your retry logic needs to extend back into the
> >> calling function for your some_func() example above.
> 
> And what do you think about the following approach (diff below)?
> 
> It seems to me, it's enough clear (especially with good accompanying comments)
> and produces a good binary code (without significant overhead).
> Also, we will remove a hidden reef in using rcu-protected (h)list traverses with restarts.
> 

> diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
> index 2ae1371..4af5ee5 100644
> --- a/include/linux/rculist_nulls.h
> +++ b/include/linux/rculist_nulls.h
> @@ -107,7 +107,8 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
>    *
>    */
>   #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)                        \
> -       for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
> +       for (ACCESS_ONCE(*(head)),                                              \
> +               pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));         \
>                  (!is_a_nulls(pos)) &&                                           \
>                  ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
>                  pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))

It looks like this still relies on gcc being friendly here.

I repeat again : @head here is a constant.

Macro already uses ACCESS_ONCE(), we only have to instruct gcc that
caching the value is forbidden if we restart the loop 
(aka "goto begin;" see Documentation/RCU/rculist_nulls.txt line 146)

Adding a barrier() is probably what we want.

I cooked followed patch and it fixes the problem.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Roman Gushchin May 28, 2013, 9:10 a.m. UTC | #1
On 28.05.2013 04:12, Eric Dumazet wrote:
> On Mon, 2013-05-27 at 21:55 +0400, Roman Gushchin wrote:
>> Hi, Paul!
>>
>>> On 25.05.2013 15:37, Paul E. McKenney wrote:
>>>> Again, I believe that your retry logic needs to extend back into the
>>>> calling function for your some_func() example above.
>>
>> And what do you think about the following approach (diff below)?
>>
>> It seems to me, it's enough clear (especially with good accompanying comments)
>> and produces a good binary code (without significant overhead).
>> Also, we will remove a hidden reef in using rcu-protected (h)list traverses with restarts.
>>
>
>> diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
>> index 2ae1371..4af5ee5 100644
>> --- a/include/linux/rculist_nulls.h
>> +++ b/include/linux/rculist_nulls.h
>> @@ -107,7 +107,8 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
>>     *
>>     */
>>    #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)                        \
>> -       for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
>> +       for (ACCESS_ONCE(*(head)),                                              \
>> +               pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));         \
>>                   (!is_a_nulls(pos)) &&                                           \
>>                   ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
>>                   pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
>
> It looks like this still relies on gcc being friendly here.
>
> I repeat again : @head here is a constant.

No.
Actually, there are two volatile objects: pointer to the first element (as a part of the head structure),
and the first element by itself. So, to be strict, head as a structure contains a volatile field.
Head->first should be treated as a volatile pointer to a volatile data. So, the whole head object is volatile.

>
> Macro already uses ACCESS_ONCE(), we only have to instruct gcc that
> caching the value is forbidden if we restart the loop
> (aka "goto begin;" see Documentation/RCU/rculist_nulls.txt line 146)

My patch seems to be correct, because, ACCESS_ONCE(*(head)) will cause gcc to (re)read head data from the memory.
According to gcc documentation:
	"A scalar volatile object is read when it is accessed in a void context:
      		volatile int *src = somevalue;
      		*src;
	Such expressions are rvalues, and GCC implements this as a read of the volatile object being pointed to."
And this is exactly our case.

> Adding a barrier() is probably what we want.

I agree, inserting barrier() is also a correct and working fix.

Thanks!

Regards,
Roman
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 29, 2013, 12:34 a.m. UTC | #2
On Tue, 2013-05-28 at 13:10 +0400, Roman Gushchin wrote:
> On 28.05.2013 04:12, Eric Dumazet wrote:

> > Adding a barrier() is probably what we want.
> 
> I agree, inserting barrier() is also a correct and working fix.

Yeah, but I can not find a clean way to put it inside the "for (;;)"

for (barrier();;)  ->

error: expected expression before ‘__asm__’

No user currently does :

if (condition)
	hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)

But who knows...


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney May 29, 2013, 1:19 a.m. UTC | #3
On Tue, May 28, 2013 at 01:10:46PM +0400, Roman Gushchin wrote:
> On 28.05.2013 04:12, Eric Dumazet wrote:
> >On Mon, 2013-05-27 at 21:55 +0400, Roman Gushchin wrote:
> >>Hi, Paul!
> >>
> >>>On 25.05.2013 15:37, Paul E. McKenney wrote:
> >>>>Again, I believe that your retry logic needs to extend back into the
> >>>>calling function for your some_func() example above.
> >>
> >>And what do you think about the following approach (diff below)?
> >>
> >>It seems to me, it's enough clear (especially with good accompanying comments)
> >>and produces a good binary code (without significant overhead).
> >>Also, we will remove a hidden reef in using rcu-protected (h)list traverses with restarts.
> >>
> >
> >>diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
> >>index 2ae1371..4af5ee5 100644
> >>--- a/include/linux/rculist_nulls.h
> >>+++ b/include/linux/rculist_nulls.h
> >>@@ -107,7 +107,8 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
> >>    *
> >>    */
> >>   #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)                        \
> >>-       for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
> >>+       for (ACCESS_ONCE(*(head)),                                              \
> >>+               pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));         \
> >>                  (!is_a_nulls(pos)) &&                                           \
> >>                  ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
> >>                  pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
> >
> >It looks like this still relies on gcc being friendly here.
> >
> >I repeat again : @head here is a constant.
> 
> No.
> Actually, there are two volatile objects: pointer to the first element (as a part of the head structure),
> and the first element by itself. So, to be strict, head as a structure contains a volatile field.
> Head->first should be treated as a volatile pointer to a volatile data. So, the whole head object is volatile.
> 
> >
> >Macro already uses ACCESS_ONCE(), we only have to instruct gcc that
> >caching the value is forbidden if we restart the loop
> >(aka "goto begin;" see Documentation/RCU/rculist_nulls.txt line 146)
> 
> My patch seems to be correct, because, ACCESS_ONCE(*(head)) will cause gcc to (re)read head data from the memory.
> According to gcc documentation:
> 	"A scalar volatile object is read when it is accessed in a void context:
>      		volatile int *src = somevalue;
>      		*src;
> 	Such expressions are rvalues, and GCC implements this as a read of the volatile object being pointed to."
> And this is exactly our case.
> 
> >Adding a barrier() is probably what we want.
> 
> I agree, inserting barrier() is also a correct and working fix.

I have to ask this question of both of you...

Are either of Eric's or Roman's fixes really guaranteed to work if gcc
elects not to inline the function containing the RCU list traversal?

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney May 29, 2013, 1:31 a.m. UTC | #4
On Tue, May 28, 2013 at 05:34:53PM -0700, Eric Dumazet wrote:
> On Tue, 2013-05-28 at 13:10 +0400, Roman Gushchin wrote:
> > On 28.05.2013 04:12, Eric Dumazet wrote:
> 
> > > Adding a barrier() is probably what we want.
> > 
> > I agree, inserting barrier() is also a correct and working fix.
> 
> Yeah, but I can not find a clean way to put it inside the "for (;;)"
> 
> for (barrier();;)  ->
> 
> error: expected expression before ‘__asm__’
> 
> No user currently does :
> 
> if (condition)
> 	hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)
> 
> But who knows...

I still have my earlier question, but I suggest "({ barrier(); XXX })"
to put the barrier into the for loop, either in the second or third
clause, where XXX was the original second or third clause.

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 29, 2013, 5:08 a.m. UTC | #5
On Tue, 2013-05-28 at 18:31 -0700, Paul E. McKenney wrote:
> On Tue, May 28, 2013 at 05:34:53PM -0700, Eric Dumazet wrote:
> > On Tue, 2013-05-28 at 13:10 +0400, Roman Gushchin wrote:
> > > On 28.05.2013 04:12, Eric Dumazet wrote:
> > 
> > > > Adding a barrier() is probably what we want.
> > > 
> > > I agree, inserting barrier() is also a correct and working fix.
> > 
> > Yeah, but I can not find a clean way to put it inside the "for (;;)"
> > 
> > for (barrier();;)  ->
> > 
> > error: expected expression before ‘__asm__’
> > 
> > No user currently does :
> > 
> > if (condition)
> > 	hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)
> > 
> > But who knows...
> 
> I still have my earlier question, but I suggest "({ barrier(); XXX })"
> to put the barrier into the for loop, either in the second or third
> clause, where XXX was the original second or third clause.
> 
> 	

Hmm, it doesn't work, I wanted to replace :

barrier();
for (expr1 ; expr2; expr3) {

by :

for ( some_clever_thing ; expr2; expr3) {

So barrier() should not be in second or third clause : it must be done
once and before "expr1".

Not a big deal anyway, we're not adding new
hlist_nulls_for_each_entry_rcu() users :)


About your earlier question, I really don't know why compiler would
cache a memory read if we explicitly use barrier() to prevent this from
happening.

BTW Roman patch generates a double load as in :

2cb1:       49 8b 07                mov    (%r15),%rax    
2cb4:       49 8b 07                mov    (%r15),%rax        


...
2ea2:       e8 f9 dc ff ff          callq  ba0 <sock_put>
2ea7:       8b 0c 24                mov    (%rsp),%ecx
2eaa:       e9 02 fe ff ff          jmpq   2cb1 <udp4_lib_lookup2+0x91> 

because of ACCESS_ONCE() used twice, once explicitly in
hlist_nulls_for_each_entry_rcu(), and once in rcu_dereference_raw()

While barrier();ptr = rcu_dereference(X); does generate a single load.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Roman Gushchin May 29, 2013, 10:09 a.m. UTC | #6
On 29.05.2013 09:08, Eric Dumazet wrote:
> On Tue, 2013-05-28 at 18:31 -0700, Paul E. McKenney wrote:
>> On Tue, May 28, 2013 at 05:34:53PM -0700, Eric Dumazet wrote:
>>> On Tue, 2013-05-28 at 13:10 +0400, Roman Gushchin wrote:
>>>> On 28.05.2013 04:12, Eric Dumazet wrote:

> About your earlier question, I really don't know why compiler would
> cache a memory read if we explicitly use barrier() to prevent this from
> happening.

I agree.

> BTW Roman patch generates a double load as in :
>
> 2cb1:       49 8b 07                mov    (%r15),%rax
> 2cb4:       49 8b 07                mov    (%r15),%rax
>
>
> ...
> 2ea2:       e8 f9 dc ff ff          callq  ba0 <sock_put>
> 2ea7:       8b 0c 24                mov    (%rsp),%ecx
> 2eaa:       e9 02 fe ff ff          jmpq   2cb1 <udp4_lib_lookup2+0x91>
>
> because of ACCESS_ONCE() used twice, once explicitly in
> hlist_nulls_for_each_entry_rcu(), and once in rcu_dereference_raw()
>
> While barrier();ptr = rcu_dereference(X); does generate a single load.

It's true.

Unfortunately, using barrier() can also prevent gcc from making some
(acceptable) code optimizations, because barrier() has a global effect,
and everything we want to reload is the (head->first) pointer.
So, to be absolutely precise, we have to introduce and use
the ACCESS_FIELD_ONCE() macro.

In any case, it doesn't look like a big problem.
In my mind, the best solution is to use the ACCESS_FIELD_ONCE() macro,
but using barrier() is also an acceptable solution.

Regards,
Roman
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
index 2ae1371..4dc51b2 100644
--- a/include/linux/rculist_nulls.h
+++ b/include/linux/rculist_nulls.h
@@ -105,8 +105,12 @@  static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
  * @head:	the head for your list.
  * @member:	the name of the hlist_nulls_node within the struct.
  *
+ * The barrier() is needed to make sure compiler doesn't cache first element,
+ * as this loop can be restarted.
+ * (cf Documentation/RCU/rculist_nulls.txt around line 146)
  */
 #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)			\
+	barrier();								\
 	for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));		\
 		(!is_a_nulls(pos)) &&						\
 		({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \