diff mbox

[net-next,v4,1/6] bonding: simplify and use RCU protection for 3ad xmit path

Message ID 20130907142041.GA20237@redhat.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Veaceslav Falico Sept. 7, 2013, 2:20 p.m. UTC
On Fri, Sep 06, 2013 at 03:28:07PM +0800, Ding Tianhong wrote:
...snip...
>+/**
>+ * IMPORTANT: bond_first/last_slave_rcu can return NULL in case of an empty list
>+ * Caller must hold rcu_read_lock
>+ */
>+#define bond_first_slave_rcu(bond) \
>+	(bond_is_empty_rcu(bond) ? NULL : \
>+					bond_to_slave_rcu((bond)->slave_list.next))
>+#define bond_last_slave_rcu(bond) \
>+	(bond_is_empty_rcu(bond) ? NULL : \
>+					bond_to_slave_rcu((bond)->slave_list.prev))
>+

Hi Ding,

I didn't have time to actually go into detail about this last week, and, as
Eric correctly said, RCU is hard even for experienced kernel developers, so
it's really not the easiest part to understand. I, for one, can't say that
I understand it completely, though I know the overall, basic usage.

That being said, I really advise you to go through, first of all,
Documentation/RCU. There are also tons of online materials already written
on this topic - and every one takes its own approach in explaining it.

I'll now try to explain what is wrong with the approach of verifying for
list emptyness while holding only RCU read lock.

First of all, you've replied to one of my emails that, quoting - "the
slave_list will not changed in the rcu_read_lock". This is completely wrong
- rcu_read_lock() is not a lock, in its classical meaning. It allows the
    list, and everything else, to be *modified* (in the most basic case) while
holding it. I.e. if we, concurrently, run list_for_each_entry_rcu() and, on
another CPU, list_del_rcu() - then we might end up holding a reference to
the entry that *was* deleted from the list already. In other words - the
list can be freely modified while we're holding rcu_read_lock().

What rcu_read_lock() *does* guarantee is that the entry we've dereferenced
won't 'go away' - as in - get freed, get deinitialized (in most cases, and
in this one - with slave_list) or whatever else that can stop us from using
it. That's, again, a really broad assumption and there are *lots* of small
things to be taken care of, and, in some cases it's not even true -
however, the basic idea is this - once you rcu_dereference() something -
you can use it while holding rcu_read_lock().

So, again - while holding the rcu_read_lock(), working with a list - the
list itself can change easily, can become empty, or can become non-empty,
or whatever else - even while going through it via
list_for_each_entry_rcu(). However, if we dereference an entry there - we
can use it.

Now, about the list emptiness under rcu_read_lock(). Lets suppose the
following code (imagine that we have list_empty_rcu(), and we're holding
the rcu_read_lock()):

    1 if (!list_empty_rcu(&my_list)) {
    2         do_something(get_first_entry(&my_list));
    3 }

on the 1st line, we verify if the list is not empty - and, sometimes, it
will be true. For example, we'll have only one element there.

Now, after we execute the first line, on another CPU we *remove* the only
remaining element from the list, and thus my_list->next will point to
&my_list - classical empty list.

So, back to the first CPU, when we run our code, we execute
get_first_entry(&my_list) - which, usually, translates to
container_of(my_list->next, ...) - getting the container of the next
element of the list - which is our *head* - and not a valid entry. And we
will, eventually, bug here.

This is exactly what you're doing:

+#define bond_last_slave_rcu(bond) \
+	(bond_is_empty_rcu(bond) ? NULL : \
+					bond_to_slave_rcu((bond)->slave_list.prev))

You first verify if the list is empty - which can be non-empty *in the time
of the verification* - and then get the bond_to_slave_rcu() of the .prev -
when the list can already become empty and the prev will point to the list
head, and not to a valid slave.

That's exactly why we don't have the list_empty_rcu() - because it's
meaningless - cause exactly after it was run the list can become empty or
non-empty. It guarantees us nothing. The only, maybe, slightly useful usage
of it could be some kind of optimization:

   1 rcu_read_lock();
   2 
   3 if (unlikely(list_empty(&bond->slave_list))) {
   4         rcu_read_unlock();
   5         return -ENODEV;
   6 }
   7 
   8 heavy_preparation1(bond);
   9 heavy_preparation2(bond);
  10 ...
  11 
  12 bond_for_each_slave(bond, slave)
  13         do_something_with_a_slave(slave);
  14 
  15 rcu_read_unlock();

Here, as you can see, we must make some heavy (time/memory-consuming)
preparations to the bond *if it has slaves* before working *with each
slave*, and if he doesn't - we can just omit those preparations (if we
can...). But still it's a bit useless, and usually we anyway need proper
RTNL or bond->lock locking before doing something like that.

Back to the real world - to implement the bond_first/last_slave_rcu(), we
*must not* rely on list emptiness before taking the reference to the
first/last element of the list. To make this happen, we *first* take the
reference to list.next/list.prev, and only *after* it we verify if what
we've dereferenced is an actual slave or the list head (i.e. the list was
empty). If it's the list head - then we assume that the list was empty and
return NULL, otherwise - if it's a normal slave - we return the
container_of(what we've dereferenced).

Here's the function that gets the first element in the list:

268 #define list_first_or_null_rcu(ptr, type, member) \
269         ({struct list_head *__ptr = (ptr); \
270           struct list_head __rcu *__next = list_next_rcu(__ptr); \
271           likely(__ptr != __next) ? container_of(__next, type, member) : NULL; \
272         })

As you can see, we first make a copy of the list_head pointer, which we
were given - cause in the meanwhile *it can also, possibly, change*.

269         ({struct list_head *__ptr = (ptr); \

Then we dereference the pointer's copy.next, and also save it - cause it
can *also* change - become (non)empty, for example.

270           struct list_head __rcu *__next = list_next_rcu(__ptr); \

And here comes the rcu magic, actually. We know that the ptr can change,
the ptr.next can also change, however we also know that if ptr.next, when
we've dereferenced it, while holding the rcu_read_lock(), was a normal list
element - then we're sure that this element *will not go away*. Thus,
everything what remained for us is just to verify - is our next pointer the
list head or not? I.e. do we actually have an element in __next, or was the
list empty at that time and we can return NULL? And, of course, if the list
was not empty - we have a valid __next, and we can return its container:

271           likely(__ptr != __next) ? container_of(__next, type, member) : NULL; \

(Also, mind the "likely(__ptr != __next)" - usually, and bonding is a very
good example, we always have at least one element in the list.)

This way, I'd recommend you to do bond_last_slave_rcu() the same way as
here - you, though, might omit saving the slave_list pointer, it's not
needed in case of bonding AFAIK. Something like that (I'm writing it in my
mail editor - so it's only for the reference):

#define bond_last_slave_rcu(bond) \
	({struct list_head *__slave_ptr = list_next_rcu(&bond->slave_list); \
	  likely(__slave_ptr != &bond->slave_list) ? \
		bond_to_slave_rcu(__slave_ptr) : NULL;})


Or, even better (from my POV), add a generic macro to rculist.h (again,
didn't even compile it) - it can be used later on:

------- END OF PATCH ------

Anyway, it's up to you.

Hope that helps.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Nikolay Aleksandrov Sept. 7, 2013, 2:45 p.m. UTC | #1
On 09/07/2013 04:20 PM, Veaceslav Falico wrote:
> On Fri, Sep 06, 2013 at 03:28:07PM +0800, Ding Tianhong wrote:
<snip>
> (Also, mind the "likely(__ptr != __next)" - usually, and bonding is a very
> good example, we always have at least one element in the list.)
> 
> This way, I'd recommend you to do bond_last_slave_rcu() the same way as
> here - you, though, might omit saving the slave_list pointer, it's not
> needed in case of bonding AFAIK. Something like that (I'm writing it in my
> mail editor - so it's only for the reference):
> 
> #define bond_last_slave_rcu(bond) \
>     ({struct list_head *__slave_ptr = list_next_rcu(&bond->slave_list); \
>       likely(__slave_ptr != &bond->slave_list) ? \
>         bond_to_slave_rcu(__slave_ptr) : NULL;})
>
> 
> Or, even better (from my POV), add a generic macro to rculist.h (again,
> didn't even compile it) - it can be used later on:
> 
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index f4b1001..37b49d1 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -23,6 +23,7 @@
>   * way, we must not access it directly
>   */
>  #define list_next_rcu(list)    (*((struct list_head __rcu
> **)(&(list)->next)))
> +#define list_prev_rcu(list)    (*((struct list_head __rcu
> **)(&(list)->prev)))
>  
>  /*
>   * Insert a new entry between two known consecutive entries.
> @@ -271,6 +272,12 @@ static inline void list_splice_init_rcu(struct
> list_head *list,
>        likely(__ptr != __next) ? container_of(__next, type, member) : NULL; \
>      })
>  
> +#define list_last_or_null_rcu(ptr, type, member) \
> +    ({struct list_head *__ptr = (ptr); \
> +      struct list_head __rcu *__last = list_prev_rcu(__ptr); \
> +      likely(__ptr != __last) ? container_of(__prev, type, member) : NULL; \
> +    })
> +
Hi,
Actually I don't think you can dereference ->prev and use the standard
list_del_rcu because it guarantees only the ->next ptr will be valid and
->prev is set to LIST_POISON2.
IMO, you'll need something like this: https://lkml.org/lkml/2012/7/25/193
with the bidir_del and all that.

But in any case I complete agree with Veaceslav here. Read all the
documentation carefully :-)

Cheers,
 Nik

>  /**
>   * list_for_each_entry_rcu    -    iterate over rcu list of given type
>   * @pos:    the type * to use as a loop cursor.
> ------- END OF PATCH ------
> 
> Anyway, it's up to you.
> 
> Hope that helps.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Veaceslav Falico Sept. 7, 2013, 3:03 p.m. UTC | #2
On Sat, Sep 07, 2013 at 04:45:05PM +0200, Nikolay Aleksandrov wrote:
>
>On 09/07/2013 04:20 PM, Veaceslav Falico wrote:
>> On Fri, Sep 06, 2013 at 03:28:07PM +0800, Ding Tianhong wrote:
...snip...
>> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
>> index f4b1001..37b49d1 100644
>> --- a/include/linux/rculist.h
>> +++ b/include/linux/rculist.h
>> @@ -23,6 +23,7 @@
>>   * way, we must not access it directly
>>   */
>>  #define list_next_rcu(list)    (*((struct list_head __rcu
>> **)(&(list)->next)))
>> +#define list_prev_rcu(list)    (*((struct list_head __rcu
>> **)(&(list)->prev)))
>>
>>  /*
>>   * Insert a new entry between two known consecutive entries.
>> @@ -271,6 +272,12 @@ static inline void list_splice_init_rcu(struct
>> list_head *list,
>>        likely(__ptr != __next) ? container_of(__next, type, member) : NULL; \
>>      })
>>
>> +#define list_last_or_null_rcu(ptr, type, member) \
>> +    ({struct list_head *__ptr = (ptr); \
>> +      struct list_head __rcu *__last = list_prev_rcu(__ptr); \
>> +      likely(__ptr != __last) ? container_of(__prev, type, member) : NULL; \
>> +    })
>> +
>Hi,
>Actually I don't think you can dereference ->prev and use the standard
>list_del_rcu because it guarantees only the ->next ptr will be valid and
>->prev is set to LIST_POISON2.
>IMO, you'll need something like this: https://lkml.org/lkml/2012/7/25/193
>with the bidir_del and all that.

Yeah, right, my bad - we can rely only on the ->next pointer, indeed,
missed that part. RCU is hard :).

So it'll be a lot harder to implement bond_last_slave_rcu() in a
'straightforward' approach.

I'd rather go in the opposite direction here - i.e. drop the 'reverse'
traversal completely, and all the use cases for bond_last_slave_rcu(). I've
got some patches already - http://patchwork.ozlabs.org/patch/272076/ doing
that, and hopefully will remove the whole 'backword' traversal completely
in the future.

>
>But in any case I complete agree with Veaceslav here. Read all the
>documentation carefully :-)
>
>Cheers,
> Nik
>
>>  /**
>>   * list_for_each_entry_rcu    -    iterate over rcu list of given type
>>   * @pos:    the type * to use as a loop cursor.
>> ------- END OF PATCH ------
>>
>> Anyway, it's up to you.
>>
>> Hope that helps.
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ding Tianhong Sept. 8, 2013, 6:05 a.m. UTC | #3
于 2013/9/7 23:03, Veaceslav Falico 写道:
> On Sat, Sep 07, 2013 at 04:45:05PM +0200, Nikolay Aleksandrov wrote:
>>
>> On 09/07/2013 04:20 PM, Veaceslav Falico wrote:
>>> On Fri, Sep 06, 2013 at 03:28:07PM +0800, Ding Tianhong wrote:
> ...snip...
>>> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
>>> index f4b1001..37b49d1 100644
>>> --- a/include/linux/rculist.h
>>> +++ b/include/linux/rculist.h
>>> @@ -23,6 +23,7 @@
>>> * way, we must not access it directly
>>> */
>>> #define list_next_rcu(list) (*((struct list_head __rcu
>>> **)(&(list)->next)))
>>> +#define list_prev_rcu(list) (*((struct list_head __rcu
>>> **)(&(list)->prev)))
>>>
>>> /*
>>> * Insert a new entry between two known consecutive entries.
>>> @@ -271,6 +272,12 @@ static inline void list_splice_init_rcu(struct
>>> list_head *list,
>>> likely(__ptr != __next) ? container_of(__next, type, member) : NULL; \
>>> })
>>>
>>> +#define list_last_or_null_rcu(ptr, type, member) \
>>> + ({struct list_head *__ptr = (ptr); \
>>> + struct list_head __rcu *__last = list_prev_rcu(__ptr); \
>>> + likely(__ptr != __last) ? container_of(__prev, type, member) : 
>>> NULL; \
>>> + })
>>> +
>> Hi,
>> Actually I don't think you can dereference ->prev and use the standard
>> list_del_rcu because it guarantees only the ->next ptr will be valid and
>> ->prev is set to LIST_POISON2.
>> IMO, you'll need something like this: 
>> https://lkml.org/lkml/2012/7/25/193
>> with the bidir_del and all that.
>
> Yeah, right, my bad - we can rely only on the ->next pointer, indeed,
> missed that part. RCU is hard :).
>
> So it'll be a lot harder to implement bond_last_slave_rcu() in a
> 'straightforward' approach.
>
> I'd rather go in the opposite direction here - i.e. drop the 'reverse'
> traversal completely, and all the use cases for bond_last_slave_rcu(). 
> I've
> got some patches already - http://patchwork.ozlabs.org/patch/272076/ 
> doing
> that, and hopefully will remove the whole 'backword' traversal completely
> in the future.
>

Although RCU is truely difficult to understand, but it is very 
interesting and beautiful,
I will follow your footsteps and finish it.

>>
>> But in any case I complete agree with Veaceslav here. Read all the
>> documentation carefully :-)
>>
>> Cheers,
>> Nik
>>

good lession, read it complete and agreed, it is a hard work to do.
I need more time to review the Veaceslav's patchset and improve my
thought, thanks Nik and Veaceslav.

Best regards
Ding

>>> /**
>>> * list_for_each_entry_rcu - iterate over rcu list of given type
>>> * @pos: the type * to use as a loop cursor.
>>> ------- END OF PATCH ------
>>>
>>> Anyway, it's up to you.
>>>
>>> Hope that helps.
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ding Tianhong Sept. 9, 2013, 8:58 a.m. UTC | #4
On 2013/9/8 14:05, Ding Tianhong wrote:

Hi Veaceslav and Nik:

please take a moment to reveiw the function just modify for bond_XXX_rcu,
and give me some advice. thanks for the help again.:)

+#define bond_first_slave_rcu(bond) \
+   list_first_or_null_rcu(&(bond)->slave_list, struct slave, list);
+#define bond_last_slave_rcu(bond) \
+   ({struct list_head *__slave_list = &(bond)->slave_list; \
+    struct list_head __rcu *__prev = \
+    (*((struct list_head __rcu **)(&(__slave_list)->prev)));\
+    likely(__slave_list != __prev) ? \
+    container_of(__prev, struct slave, list) : NULL;})
+
 #define bond_is_first_slave(bond, pos) ((pos)->list.prev == &(bond)->slave_list)
 #define bond_is_last_slave(bond, pos) ((pos)->list.next == &(bond)->slave_list)

@@ -93,6 +117,29 @@
    (bond_is_first_slave(bond, pos) ? bond_last_slave(bond) : \
                      bond_to_slave((pos)->list.prev))

+/* Since bond_first/last_slave_rcu can return NULL, these can return NULL too */
+#define bond_next_slave_rcu(bond, pos) \
+   ({struct list_head *__slave_list = &(bond)->slave_list; \
+    struct list_head __rcu *__next = list_next_rcu(__slave_list); \
+    struct list_head *__pos_list = &(pos)->list; \
+    struct list_head __rcu *__pos_next = list_next_rcu(__pos_list); \
+    likely(__pos_next != __slave_list) ? \
+    container_of(__pos_next, struct slave, list) : \
+    container_of(__next, struct slave, list); \
+    })
+
+#define bond_prev_slave_rcu(bond, pos) \
+   ({struct list_head *__slave_list = &(bond)->slave_list; \
+    struct list_head __rcu *__prev = \
+    (*((struct list_head __rcu **)(&(__slave_list)->prev)));\
+    struct list_head *__pos_list = &(pos)->list; \
+    struct list_head __rcu *__pos_prev = (__pos_list->prev != LIST_POISON2) ? \
+    (*((struct list_head __rcu **)(&(__pos_list)->prev))) : NULL; \
+    likely(__pos_prev != __slave_list) ? \
+    ((__pos_prev) ? list_entry_rcu(__pos_prev, struct slave, list) : NULL;) : \
+    (list_entry_rcu(__prev, struct slave, list)); \
+    })
+


-#define bond_for_each_slave_from(bond, pos, cnt, start) \
-   for (cnt = 0, pos = start; pos && cnt < (bond)->slave_cnt; \
-        cnt++, pos = bond_next_slave(bond, pos))
-
+#define bond_for_each_slave_from(bond, pos, start) \
+   for (pos = start; pos; (pos = bond_next_slave(bond, pos)) != start ? \
+       (pos) : (pos = NULL))
+
+#define bond_for_each_slave_from_rcu(bond, pos, start) \
+   for ({struct list_head *__start = &(start)->list; \
+        struct list_head *__slave_list = &(bond)->slave_list; \
+        pos = list_entry_rcu(__start, struct slave, list);}; \
+        pos; \
+        {struct list_head __rcu *__next = list_next_rcu(pos->next); \
+        __next != __slave_list ? \
+        __next : __next = list_next_rcu(__next->next); \
+        __next != __start ? \
+        pos = list_entry_rcu(__next, struct slave, list) : \
+        pos = NULL; \
+        })
+

Best regards
Ding


>> -- 
>> To unsubscribe from this list: send the line "unsubsc ribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
> 
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Veaceslav Falico Sept. 9, 2013, 9:57 a.m. UTC | #5
On Mon, Sep 09, 2013 at 04:58:29PM +0800, Ding Tianhong wrote:
>On 2013/9/8 14:05, Ding Tianhong wrote:
>
>Hi Veaceslav and Nik:
>
>please take a moment to reveiw the function just modify for bond_XXX_rcu,
>and give me some advice. thanks for the help again.:)
>
>+#define bond_first_slave_rcu(bond) \
>+   list_first_or_null_rcu(&(bond)->slave_list, struct slave, list);
>+#define bond_last_slave_rcu(bond) \
>+   ({struct list_head *__slave_list = &(bond)->slave_list; \
>+    struct list_head __rcu *__prev = \
>+    (*((struct list_head __rcu **)(&(__slave_list)->prev)));\
>+    likely(__slave_list != __prev) ? \
>+    container_of(__prev, struct slave, list) : NULL;})

Please take a look at Nikolay's reply to my RCU email -
http://www.spinics.net/lists/netdev/msg249805.html . And mine also, to his
email. In short - RCU doesn't guarantee ->prev, so better take the approach
of eliminating bond_last/prev_slave completely.

>+
> #define bond_is_first_slave(bond, pos) ((pos)->list.prev == &(bond)->slave_list)
> #define bond_is_last_slave(bond, pos) ((pos)->list.next == &(bond)->slave_list)
>
>@@ -93,6 +117,29 @@
>    (bond_is_first_slave(bond, pos) ? bond_last_slave(bond) : \
>                      bond_to_slave((pos)->list.prev))
>
>+/* Since bond_first/last_slave_rcu can return NULL, these can return NULL too */
>+#define bond_next_slave_rcu(bond, pos) \
>+   ({struct list_head *__slave_list = &(bond)->slave_list; \
>+    struct list_head __rcu *__next = list_next_rcu(__slave_list); \
>+    struct list_head *__pos_list = &(pos)->list; \
>+    struct list_head __rcu *__pos_next = list_next_rcu(__pos_list); \
>+    likely(__pos_next != __slave_list) ? \
>+    container_of(__pos_next, struct slave, list) : \
>+    container_of(__next, struct slave, list); \
>+    })

Nice, but can be shortened - we know that pos won't go away.

>+
>+#define bond_prev_slave_rcu(bond, pos) \
>+   ({struct list_head *__slave_list = &(bond)->slave_list; \
>+    struct list_head __rcu *__prev = \
>+    (*((struct list_head __rcu **)(&(__slave_list)->prev)));\
>+    struct list_head *__pos_list = &(pos)->list; \
>+    struct list_head __rcu *__pos_prev = (__pos_list->prev != LIST_POISON2) ? \
>+    (*((struct list_head __rcu **)(&(__pos_list)->prev))) : NULL; \
>+    likely(__pos_prev != __slave_list) ? \
>+    ((__pos_prev) ? list_entry_rcu(__pos_prev, struct slave, list) : NULL;) : \
>+    (list_entry_rcu(__prev, struct slave, list)); \
>+    })

Same remark as above about prev.

>+
>
>
>-#define bond_for_each_slave_from(bond, pos, cnt, start) \
>-   for (cnt = 0, pos = start; pos && cnt < (bond)->slave_cnt; \
>-        cnt++, pos = bond_next_slave(bond, pos))
>-
>+#define bond_for_each_slave_from(bond, pos, start) \
>+   for (pos = start; pos; (pos = bond_next_slave(bond, pos)) != start ? \
>+       (pos) : (pos = NULL))
>+
>+#define bond_for_each_slave_from_rcu(bond, pos, start) \
>+   for ({struct list_head *__start = &(start)->list; \
>+        struct list_head *__slave_list = &(bond)->slave_list; \
>+        pos = list_entry_rcu(__start, struct slave, list);}; \
>+        pos; \
>+        {struct list_head __rcu *__next = list_next_rcu(pos->next); \
>+        __next != __slave_list ? \
>+        __next : __next = list_next_rcu(__next->next); \
>+        __next != __start ? \
>+        pos = list_entry_rcu(__next, struct slave, list) : \
>+        pos = NULL; \
>+        })

Jeez, I don't even want to review it. It's too complex and too hard to
maintain, even if it works. Can you please make something shorter/easier to
understand?

>+
>
>Best regards
>Ding
>
>
>>> --
>>> To unsubscribe from this list: send the line "unsubsc ribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
>> .
>>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ding Tianhong Sept. 9, 2013, 2:53 p.m. UTC | #6
于 2013/9/9 17:57, Veaceslav Falico 写道:
> On Mon, Sep 09, 2013 at 04:58:29PM +0800, Ding Tianhong wrote:
>> On 2013/9/8 14:05, Ding Tianhong wrote:
>>
>> Hi Veaceslav and Nik:
>>
>> please take a moment to reveiw the function just modify for 
>> bond_XXX_rcu,
>> and give me some advice. thanks for the help again.:)
>>
>> +#define bond_first_slave_rcu(bond) \
>> + list_first_or_null_rcu(&(bond)->slave_list, struct slave, list);
>> +#define bond_last_slave_rcu(bond) \
>> + ({struct list_head *__slave_list = &(bond)->slave_list; \
>> + struct list_head __rcu *__prev = \
>> + (*((struct list_head __rcu **)(&(__slave_list)->prev)));\
>> + likely(__slave_list != __prev) ? \
>> + container_of(__prev, struct slave, list) : NULL;})
>
> Please take a look at Nikolay's reply to my RCU email -
> http://www.spinics.net/lists/netdev/msg249805.html . And mine also, to 
> his
> email. In short - RCU doesn't guarantee ->prev, so better take the 
> approach
> of eliminating bond_last/prev_slave completely.
>
yes, I see the message, the list_del_rcu will make the slave->list 
->prev = LIST_POISON2,
the bond->slave_list will not be set to the messae, the prev will point 
a slave->list or itself,
so I think it will be ok here, please correct me if I miss something.

Best Regards
Ding

>> +
>> #define bond_is_first_slave(bond, pos) ((pos)->list.prev == 
>> &(bond)->slave_list)
>> #define bond_is_last_slave(bond, pos) ((pos)->list.next == 
>> &(bond)->slave_list)
>>
>> @@ -93,6 +117,29 @@
>> (bond_is_first_slave(bond, pos) ? bond_last_slave(bond) : \
>> bond_to_slave((pos)->list.prev))
>>
>> +/* Since bond_first/last_slave_rcu can return NULL, these can return 
>> NULL too */
>> +#define bond_next_slave_rcu(bond, pos) \
>> + ({struct list_head *__slave_list = &(bond)->slave_list; \
>> + struct list_head __rcu *__next = list_next_rcu(__slave_list); \
>> + struct list_head *__pos_list = &(pos)->list; \
>> + struct list_head __rcu *__pos_next = list_next_rcu(__pos_list); \
>> + likely(__pos_next != __slave_list) ? \
>> + container_of(__pos_next, struct slave, list) : \
>> + container_of(__next, struct slave, list); \
>> + })
>
> Nice, but can be shortened - we know that pos won't go away.

OK, clean it soon.

>
>> +
>> +#define bond_prev_slave_rcu(bond, pos) \
>> + ({struct list_head *__slave_list = &(bond)->slave_list; \
>> + struct list_head __rcu *__prev = \
>> + (*((struct list_head __rcu **)(&(__slave_list)->prev)));\
>> + struct list_head *__pos_list = &(pos)->list; \
>> + struct list_head __rcu *__pos_prev = (__pos_list->prev 
>> !=LIST_POISON2) ? \
yes, the pos->list will be set to LIST_POISON2 by list_del_rcu, so I add 
a check for it, But
take the approach of eliminating bond_last/prev_slave completely is a 
wise decision, I agree.

>> + (*((struct list_head __rcu **)(&(__pos_list)->prev))) : NULL; \
>> + likely(__pos_prev != __slave_list) ? \
>> + ((__pos_prev) ? list_entry_rcu(__pos_prev, struct slave, list) : 
>> NULL;) : \
>> + (list_entry_rcu(__prev, struct slave, list)); \
>> + })
>
> Same remark as above about prev.
>
>> +
>>
>>
>> -#define bond_for_each_slave_from(bond, pos, cnt, start) \
>> - for (cnt = 0, pos = start; pos && cnt < (bond)->slave_cnt; \
>> - cnt++, pos = bond_next_slave(bond, pos))
>> -
>> +#define bond_for_each_slave_from(bond, pos, start) \
>> + for (pos = start; pos; (pos = bond_next_slave(bond, pos)) != start ? \
>> + (pos) : (pos = NULL))
>> +
>> +#define bond_for_each_slave_from_rcu(bond, pos, start) \
yes, it is a little tedious. I think it could be more easier and shorter.

>> + for ({struct list_head *__start = &(start)->list; \
>> + struct list_head *__slave_list = &(bond)->slave_list; \
>> + pos = list_entry_rcu(__start, struct slave, list);}; \
>> + pos; \
the only way to get out of the loop is that pos is NULL.
>> + {struct list_head __rcu *__next = list_next_rcu(pos->next); \
>> + __next != __slave_list ? \
>> + __next : __next = list_next_rcu(__next->next); \
first, check whether the pos->next is the last one in the slave_list, if 
it does, get the
first slave of the bond->slave_list.
>>
>> + __next != __start ? \
>> + pos = list_entry_rcu(__next, struct slave, list) : \
>> + pos = NULL; \
second, check whether the pos is reach the start, if not, continue, 
otherwise, the pos
will be set to NULL, so break the loop.
>> + })
>
> Jeez, I don't even want to review it. It's too complex and too hard to
> maintain, even if it works. Can you please make something 
> shorter/easier to
> understand?
>

Best Regards.
Ding

>> +
>>
>> Best regards
>> Ding
>>
>>
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubsc ribe netdev" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>>
>>> .
>>>
>>
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Veaceslav Falico Sept. 9, 2013, 8:17 p.m. UTC | #7
On Mon, Sep 09, 2013 at 10:53:20PM +0800, Ding Tianhong wrote:
>于 2013/9/9 17:57, Veaceslav Falico 写道:
>>On Mon, Sep 09, 2013 at 04:58:29PM +0800, Ding Tianhong wrote:
>>>On 2013/9/8 14:05, Ding Tianhong wrote:
>>>
>>>Hi Veaceslav and Nik:
>>>
>>>please take a moment to reveiw the function just modify for 
>>>bond_XXX_rcu,
>>>and give me some advice. thanks for the help again.:)
>>>
>>>+#define bond_first_slave_rcu(bond) \
>>>+ list_first_or_null_rcu(&(bond)->slave_list, struct slave, list);
>>>+#define bond_last_slave_rcu(bond) \
>>>+ ({struct list_head *__slave_list = &(bond)->slave_list; \
>>>+ struct list_head __rcu *__prev = \
>>>+ (*((struct list_head __rcu **)(&(__slave_list)->prev)));\
>>>+ likely(__slave_list != __prev) ? \
>>>+ container_of(__prev, struct slave, list) : NULL;})
>>
>>Please take a look at Nikolay's reply to my RCU email -
>>http://www.spinics.net/lists/netdev/msg249805.html . And mine also, 
>>to his
>>email. In short - RCU doesn't guarantee ->prev, so better take the 
>>approach
>>of eliminating bond_last/prev_slave completely.
>>
>yes, I see the message, the list_del_rcu will make the slave->list 
>->prev = LIST_POISON2,
>the bond->slave_list will not be set to the messae, the prev will 
>point a slave->list or itself,
>so I think it will be ok here, please correct me if I miss something.

Hi Ding,

Please take a look at the patchset I've just sent to net-next:

[PATCH net-next 0/26] bonding: use neighbours instead of own lists

It removes all the ->prev usage, and uses completely different primitives
for list traversal, and - *drops* the bond->slave_list and slave->list
completely. I think it would be a lot easier to RCUify bonding further
providing this patchset.

Thanks.

>
>Best Regards
>Ding
>
>>>+
>>>#define bond_is_first_slave(bond, pos) ((pos)->list.prev == 
>>>&(bond)->slave_list)
>>>#define bond_is_last_slave(bond, pos) ((pos)->list.next == 
>>>&(bond)->slave_list)
>>>
>>>@@ -93,6 +117,29 @@
>>>(bond_is_first_slave(bond, pos) ? bond_last_slave(bond) : \
>>>bond_to_slave((pos)->list.prev))
>>>
>>>+/* Since bond_first/last_slave_rcu can return NULL, these can 
>>>return NULL too */
>>>+#define bond_next_slave_rcu(bond, pos) \
>>>+ ({struct list_head *__slave_list = &(bond)->slave_list; \
>>>+ struct list_head __rcu *__next = list_next_rcu(__slave_list); \
>>>+ struct list_head *__pos_list = &(pos)->list; \
>>>+ struct list_head __rcu *__pos_next = list_next_rcu(__pos_list); \
>>>+ likely(__pos_next != __slave_list) ? \
>>>+ container_of(__pos_next, struct slave, list) : \
>>>+ container_of(__next, struct slave, list); \
>>>+ })
>>
>>Nice, but can be shortened - we know that pos won't go away.
>
>OK, clean it soon.
>
>>
>>>+
>>>+#define bond_prev_slave_rcu(bond, pos) \
>>>+ ({struct list_head *__slave_list = &(bond)->slave_list; \
>>>+ struct list_head __rcu *__prev = \
>>>+ (*((struct list_head __rcu **)(&(__slave_list)->prev)));\
>>>+ struct list_head *__pos_list = &(pos)->list; \
>>>+ struct list_head __rcu *__pos_prev = (__pos_list->prev 
>>>!=LIST_POISON2) ? \
>yes, the pos->list will be set to LIST_POISON2 by list_del_rcu, so I 
>add a check for it, But
>take the approach of eliminating bond_last/prev_slave completely is a 
>wise decision, I agree.
>
>>>+ (*((struct list_head __rcu **)(&(__pos_list)->prev))) : NULL; \
>>>+ likely(__pos_prev != __slave_list) ? \
>>>+ ((__pos_prev) ? list_entry_rcu(__pos_prev, struct slave, list) 
>>>: NULL;) : \
>>>+ (list_entry_rcu(__prev, struct slave, list)); \
>>>+ })
>>
>>Same remark as above about prev.
>>
>>>+
>>>
>>>
>>>-#define bond_for_each_slave_from(bond, pos, cnt, start) \
>>>- for (cnt = 0, pos = start; pos && cnt < (bond)->slave_cnt; \
>>>- cnt++, pos = bond_next_slave(bond, pos))
>>>-
>>>+#define bond_for_each_slave_from(bond, pos, start) \
>>>+ for (pos = start; pos; (pos = bond_next_slave(bond, pos)) != start ? \
>>>+ (pos) : (pos = NULL))
>>>+
>>>+#define bond_for_each_slave_from_rcu(bond, pos, start) \
>yes, it is a little tedious. I think it could be more easier and shorter.
>
>>>+ for ({struct list_head *__start = &(start)->list; \
>>>+ struct list_head *__slave_list = &(bond)->slave_list; \
>>>+ pos = list_entry_rcu(__start, struct slave, list);}; \
>>>+ pos; \
>the only way to get out of the loop is that pos is NULL.
>>>+ {struct list_head __rcu *__next = list_next_rcu(pos->next); \
>>>+ __next != __slave_list ? \
>>>+ __next : __next = list_next_rcu(__next->next); \
>first, check whether the pos->next is the last one in the slave_list, 
>if it does, get the
>first slave of the bond->slave_list.
>>>
>>>+ __next != __start ? \
>>>+ pos = list_entry_rcu(__next, struct slave, list) : \
>>>+ pos = NULL; \
>second, check whether the pos is reach the start, if not, continue, 
>otherwise, the pos
>will be set to NULL, so break the loop.
>>>+ })
>>
>>Jeez, I don't even want to review it. It's too complex and too hard to
>>maintain, even if it works. Can you please make something 
>>shorter/easier to
>>understand?
>>
>
>Best Regards.
>Ding
>
>>>+
>>>
>>>Best regards
>>>Ding
>>>
>>>
>>>>>-- 
>>>>>To unsubscribe from this list: send the line "unsubsc ribe netdev" in
>>>>>the body of a message to majordomo@vger.kernel.org
>>>>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>>>
>>>>.
>>>>
>>>
>>>
>>-- 
>>To unsubscribe from this list: send the line "unsubscribe netdev" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney Sept. 12, 2013, 4:17 p.m. UTC | #8
On Sat, Sep 07, 2013 at 05:03:50PM +0200, Veaceslav Falico wrote:
> On Sat, Sep 07, 2013 at 04:45:05PM +0200, Nikolay Aleksandrov wrote:
> >
> >On 09/07/2013 04:20 PM, Veaceslav Falico wrote:
> >>On Fri, Sep 06, 2013 at 03:28:07PM +0800, Ding Tianhong wrote:
> ...snip...
> >>diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> >>index f4b1001..37b49d1 100644
> >>--- a/include/linux/rculist.h
> >>+++ b/include/linux/rculist.h
> >>@@ -23,6 +23,7 @@
> >>  * way, we must not access it directly
> >>  */
> >> #define list_next_rcu(list)    (*((struct list_head __rcu
> >>**)(&(list)->next)))
> >>+#define list_prev_rcu(list)    (*((struct list_head __rcu
> >>**)(&(list)->prev)))
> >>
> >> /*
> >>  * Insert a new entry between two known consecutive entries.
> >>@@ -271,6 +272,12 @@ static inline void list_splice_init_rcu(struct
> >>list_head *list,
> >>       likely(__ptr != __next) ? container_of(__next, type, member) : NULL; \
> >>     })
> >>
> >>+#define list_last_or_null_rcu(ptr, type, member) \
> >>+    ({struct list_head *__ptr = (ptr); \
> >>+      struct list_head __rcu *__last = list_prev_rcu(__ptr); \
> >>+      likely(__ptr != __last) ? container_of(__prev, type, member) : NULL; \
> >>+    })
> >>+
> >Hi,
> >Actually I don't think you can dereference ->prev and use the standard
> >list_del_rcu because it guarantees only the ->next ptr will be valid and
> >->prev is set to LIST_POISON2.
> >IMO, you'll need something like this: https://lkml.org/lkml/2012/7/25/193
> >with the bidir_del and all that.
> 
> Yeah, right, my bad - we can rely only on the ->next pointer, indeed,
> missed that part. RCU is hard :).
> 
> So it'll be a lot harder to implement bond_last_slave_rcu() in a
> 'straightforward' approach.
> 
> I'd rather go in the opposite direction here - i.e. drop the 'reverse'
> traversal completely, and all the use cases for bond_last_slave_rcu(). I've
> got some patches already - http://patchwork.ozlabs.org/patch/272076/ doing
> that, and hopefully will remove the whole 'backword' traversal completely
> in the future.

If it is important, it would be possible to create an RCU-protected
linked list that allowed readers to go in both directions -- mostly just
leave out the poisoning of the ->prev pointers.  We do -not- want to
do this for the normal RCU-protected linked lists because the poisoning
does find bugs.

However, you would have to be quite careful with a bi-directional
RCU-protected linked list, because p->next->prev will not necessarily
be equal to p, for all the reasons that you guys listed out earlier in
this thread.  So be very careful what you wish for!  ;-)

							Thanx, Paul

> >But in any case I complete agree with Veaceslav here. Read all the
> >documentation carefully :-)
> >
> >Cheers,
> >Nik
> >
> >> /**
> >>  * list_for_each_entry_rcu    -    iterate over rcu list of given type
> >>  * @pos:    the type * to use as a loop cursor.
> >>------- END OF PATCH ------
> >>
> >>Anyway, it's up to you.
> >>
> >>Hope that helps.
> >
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index f4b1001..37b49d1 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -23,6 +23,7 @@ 
   * way, we must not access it directly
   */
  #define list_next_rcu(list)	(*((struct list_head __rcu **)(&(list)->next)))
+#define list_prev_rcu(list)	(*((struct list_head __rcu **)(&(list)->prev)))
  
  /*
   * Insert a new entry between two known consecutive entries.
@@ -271,6 +272,12 @@  static inline void list_splice_init_rcu(struct list_head *list,
  	  likely(__ptr != __next) ? container_of(__next, type, member) : NULL; \
  	})
  
+#define list_last_or_null_rcu(ptr, type, member) \
+	({struct list_head *__ptr = (ptr); \
+	  struct list_head __rcu *__last = list_prev_rcu(__ptr); \
+	  likely(__ptr != __last) ? container_of(__prev, type, member) : NULL; \
+	})
+
  /**
   * list_for_each_entry_rcu	-	iterate over rcu list of given type
   * @pos:	the type * to use as a loop cursor.