Patchwork [2/2] ipvs: Use cond_resched_rcu_lock() helper when dumping connections

login
register
mail settings
Submitter Simon Horman
Date April 26, 2013, 1:45 a.m.
Message ID <1366940708-10180-3-git-send-email-horms@verge.net.au>
Download mbox | patch
Permalink /patch/239613/
State Not Applicable
Headers show

Comments

Simon Horman - April 26, 2013, 1:45 a.m.
This avoids the situation where a dump of a large number of connections
may prevent scheduling for a long time while also avoiding excessive
calls to rcu_read_unlock() and rcu_read_lock().

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_conn.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)
Ingo Molnar - April 26, 2013, 6:15 a.m.
* Simon Horman <horms@verge.net.au> wrote:

> This avoids the situation where a dump of a large number of connections
> may prevent scheduling for a long time while also avoiding excessive
> calls to rcu_read_unlock() and rcu_read_lock().
> 
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Julian Anastasov <ja@ssi.bg>
> Signed-off-by: Simon Horman <horms@verge.net.au>
> ---
>  net/netfilter/ipvs/ip_vs_conn.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
> index a083bda..42a7b33 100644
> --- a/net/netfilter/ipvs/ip_vs_conn.c
> +++ b/net/netfilter/ipvs/ip_vs_conn.c
> @@ -975,8 +975,7 @@ static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)
>  				return cp;
>  			}
>  		}
> -		rcu_read_unlock();
> -		rcu_read_lock();
> +		cond_resched_rcu_lock();
>  	}
>  
>  	return NULL;
> @@ -1015,8 +1014,7 @@ static void *ip_vs_conn_seq_next(struct seq_file *seq, void *v, loff_t *pos)
>  			iter->l = &ip_vs_conn_tab[idx];
>  			return cp;
>  		}
> -		rcu_read_unlock();
> -		rcu_read_lock();
> +		cond_resched_rcu_lock();

Feel free to route this via the networking tree.

Note that this change isn't a pure clean-up but has functional effects as 
well: on !PREEMPT or PREEMPT_VOLUNTARY kernels it will add in a potential 
cond_resched() - while previously it would only rcu unlock and relock 
(which in itself isn't scheduling).

This should probably be pointed out in the changelog.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra - April 26, 2013, 8:03 a.m.
On Fri, Apr 26, 2013 at 10:45:08AM +0900, Simon Horman wrote:

> @@ -975,8 +975,7 @@ static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)
>  				return cp;
>  			}
>  		}
> -		rcu_read_unlock();
> -		rcu_read_lock();
> +		cond_resched_rcu_lock();
>  	}


While I agree with the sentiment I do find it a somewhat dangerous construct in
that it might become far too easy to keep an RCU reference over this break and
thus violate the RCU premise.

Is there anything that can detect this? Sparse / cocinelle / smatch? If so it
would be great to add this to these checkers.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney - April 26, 2013, 3:45 p.m.
On Fri, Apr 26, 2013 at 10:03:13AM +0200, Peter Zijlstra wrote:
> On Fri, Apr 26, 2013 at 10:45:08AM +0900, Simon Horman wrote:
> 
> > @@ -975,8 +975,7 @@ static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)
> >  				return cp;
> >  			}
> >  		}
> > -		rcu_read_unlock();
> > -		rcu_read_lock();
> > +		cond_resched_rcu_lock();
> >  	}
> 
> 
> While I agree with the sentiment I do find it a somewhat dangerous construct in
> that it might become far too easy to keep an RCU reference over this break and
> thus violate the RCU premise.
> 
> Is there anything that can detect this? Sparse / cocinelle / smatch? If so it
> would be great to add this to these checkers.

I have done some crude coccinelle patterns in the past, but they are
subject to false positives (from when you transfer the pointer from
RCU protection to reference-count protection) and also false negatives
(when you atomically increment some statistic unrelated to protection).

I could imagine maintaining a per-thread count of the number of outermost
RCU read-side critical sections at runtime, and then associating that
counter with a given pointer at rcu_dereference() time, but this would
require either compiler magic or an API for using a pointer returned
by rcu_dereference().  This API could in theory be enforced by sparse.

Dhaval Giani might have some ideas as well, adding him to CC.

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - April 26, 2013, 3:59 p.m.
On Fri, 2013-04-26 at 08:45 -0700, Paul E. McKenney wrote:

> I have done some crude coccinelle patterns in the past, but they are
> subject to false positives (from when you transfer the pointer from
> RCU protection to reference-count protection) and also false negatives
> (when you atomically increment some statistic unrelated to protection).
> 
> I could imagine maintaining a per-thread count of the number of outermost
> RCU read-side critical sections at runtime, and then associating that
> counter with a given pointer at rcu_dereference() time, but this would
> require either compiler magic or an API for using a pointer returned
> by rcu_dereference().  This API could in theory be enforced by sparse.
> 
> Dhaval Giani might have some ideas as well, adding him to CC.


We had this fix the otherday, because tcp prequeue code hit this check :

static inline struct dst_entry *skb_dst(const struct sk_buff *skb)
{
        /* If refdst was not refcounted, check we still are in a 
         * rcu_read_lock section
         */
        WARN_ON((skb->_skb_refdst & SKB_DST_NOREF) &&
                !rcu_read_lock_held() &&
                !rcu_read_lock_bh_held());
        return (struct dst_entry *)(skb->_skb_refdst & SKB_DST_PTRMASK);
}

( http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=093162553c33e9479283e107b4431378271c735d )

Problem is the rcu protected pointer was escaping the rcu lock and was
then used in another thread.


What would be cool (but expensive maybe) , would be to get a cookie from
rcu_read_lock(), and check the cookie at rcu_dereference(). These
cookies would have system wide scope to catch any kind of errors.

Because a per thread counter would not catch following problem :

rcu_read_lock();

ptr = rcu_dereference(x);
if (!ptr)
	return NULL;
...


rcu_read_unlock();

...
rcu_read_lock();
/* no reload of x, ptr might be now stale/freed */
if (ptr->field) { ... }



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney - April 26, 2013, 4:30 p.m.
On Fri, Apr 26, 2013 at 08:59:07AM -0700, Eric Dumazet wrote:
> On Fri, 2013-04-26 at 08:45 -0700, Paul E. McKenney wrote:
> 
> > I have done some crude coccinelle patterns in the past, but they are
> > subject to false positives (from when you transfer the pointer from
> > RCU protection to reference-count protection) and also false negatives
> > (when you atomically increment some statistic unrelated to protection).
> > 
> > I could imagine maintaining a per-thread count of the number of outermost
> > RCU read-side critical sections at runtime, and then associating that
> > counter with a given pointer at rcu_dereference() time, but this would
> > require either compiler magic or an API for using a pointer returned
> > by rcu_dereference().  This API could in theory be enforced by sparse.
> > 
> > Dhaval Giani might have some ideas as well, adding him to CC.
> 
> 
> We had this fix the otherday, because tcp prequeue code hit this check :
> 
> static inline struct dst_entry *skb_dst(const struct sk_buff *skb)
> {
>         /* If refdst was not refcounted, check we still are in a 
>          * rcu_read_lock section
>          */
>         WARN_ON((skb->_skb_refdst & SKB_DST_NOREF) &&
>                 !rcu_read_lock_held() &&
>                 !rcu_read_lock_bh_held());
>         return (struct dst_entry *)(skb->_skb_refdst & SKB_DST_PTRMASK);
> }
> 
> ( http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=093162553c33e9479283e107b4431378271c735d )
> 
> Problem is the rcu protected pointer was escaping the rcu lock and was
> then used in another thread.

All the way to some other thread?  That is a serious escape!  ;-)

> What would be cool (but expensive maybe) , would be to get a cookie from
> rcu_read_lock(), and check the cookie at rcu_dereference(). These
> cookies would have system wide scope to catch any kind of errors.

I suspect that your cookie and my counter are quite similar.

> Because a per thread counter would not catch following problem :
> 
> rcu_read_lock();
> 
> ptr = rcu_dereference(x);
> if (!ptr)
> 	return NULL;
> ...
> 
> 
> rcu_read_unlock();
> 
> ...
> rcu_read_lock();
> /* no reload of x, ptr might be now stale/freed */
> if (ptr->field) { ... }

Well, that is why I needed to appeal to compiler magic or an API
extension.  Either way, the pointer returned from rcu_dereference()
must be tagged with the ID of the outermost rcu_read_lock() that
covers it.  Then that tag must be checked against that of the outermost
rcu_read_lock() when it is dereferenced.  If we introduced yet another
set of RCU API members (shudder!) with which to wrapper your
ptr->field use, we could associate additional storage with the
pointer to hold the cookie/counter.  And then use sparse to enforce
use of the new API.

Of course, if we were using C++, we could define a template for pointers
returned by rcu_dereference() in order to make this work.  Now, where
did I put that asbestos suit, you know, the one with the titanium
pinstripes?  ;-)

But even that has some "interesting" issues.  To see this, let's
modify your example a bit:

	rcu_read_lock();

	...

	rcu_read_lock_bh();

	ptr = rcu_dereference(x);
	if (!ptr)
		return NULL;
	...


	rcu_read_unlock_bh();

	...
	/* no reload of x, ptr might be now stale/freed */
	if (ptr->field) { ... }

	...

	rcu_read_unlock();


Now, is it the rcu_read_lock() or the rcu_read_lock_bh() that protects
the rcu_dereference()?

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra - April 26, 2013, 5:19 p.m.
On Fri, Apr 26, 2013 at 08:45:47AM -0700, Paul E. McKenney wrote:
> On Fri, Apr 26, 2013 at 10:03:13AM +0200, Peter Zijlstra wrote:
> > On Fri, Apr 26, 2013 at 10:45:08AM +0900, Simon Horman wrote:
> > 
> > > @@ -975,8 +975,7 @@ static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)
> > >  				return cp;
> > >  			}
> > >  		}
> > > -		rcu_read_unlock();
> > > -		rcu_read_lock();
> > > +		cond_resched_rcu_lock();
> > >  	}
> > 
> > 
> > While I agree with the sentiment I do find it a somewhat dangerous construct in
> > that it might become far too easy to keep an RCU reference over this break and
> > thus violate the RCU premise.
> > 
> > Is there anything that can detect this? Sparse / cocinelle / smatch? If so it
> > would be great to add this to these checkers.
> 
> I have done some crude coccinelle patterns in the past, but they are
> subject to false positives (from when you transfer the pointer from
> RCU protection to reference-count protection) and also false negatives
> (when you atomically increment some statistic unrelated to protection).
> 
> I could imagine maintaining a per-thread count of the number of outermost
> RCU read-side critical sections at runtime, and then associating that
> counter with a given pointer at rcu_dereference() time, but this would
> require either compiler magic or an API for using a pointer returned
> by rcu_dereference().  This API could in theory be enforced by sparse.

Luckily cond_resched_rcu_lock() will typically only occur within loops, and
loops tend to be contained in a single sourcefile.

This would suggest a simple static checker should be able to tell without too
much magic right? All it needs to do is track pointers returned from
rcu_dereference*() and see if they're used after cond_resched_rcu_lock().

Also, cond_resched_rcu_lock() will only drop a single level of RCU refs; so
that should be easier still.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney - April 26, 2013, 5:48 p.m.
On Fri, Apr 26, 2013 at 07:19:49PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 26, 2013 at 08:45:47AM -0700, Paul E. McKenney wrote:
> > On Fri, Apr 26, 2013 at 10:03:13AM +0200, Peter Zijlstra wrote:
> > > On Fri, Apr 26, 2013 at 10:45:08AM +0900, Simon Horman wrote:
> > > 
> > > > @@ -975,8 +975,7 @@ static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)
> > > >  				return cp;
> > > >  			}
> > > >  		}
> > > > -		rcu_read_unlock();
> > > > -		rcu_read_lock();
> > > > +		cond_resched_rcu_lock();
> > > >  	}
> > > 
> > > 
> > > While I agree with the sentiment I do find it a somewhat dangerous construct in
> > > that it might become far too easy to keep an RCU reference over this break and
> > > thus violate the RCU premise.
> > > 
> > > Is there anything that can detect this? Sparse / cocinelle / smatch? If so it
> > > would be great to add this to these checkers.
> > 
> > I have done some crude coccinelle patterns in the past, but they are
> > subject to false positives (from when you transfer the pointer from
> > RCU protection to reference-count protection) and also false negatives
> > (when you atomically increment some statistic unrelated to protection).
> > 
> > I could imagine maintaining a per-thread count of the number of outermost
> > RCU read-side critical sections at runtime, and then associating that
> > counter with a given pointer at rcu_dereference() time, but this would
> > require either compiler magic or an API for using a pointer returned
> > by rcu_dereference().  This API could in theory be enforced by sparse.
> 
> Luckily cond_resched_rcu_lock() will typically only occur within loops, and
> loops tend to be contained in a single sourcefile.
> 
> This would suggest a simple static checker should be able to tell without too
> much magic right? All it needs to do is track pointers returned from
> rcu_dereference*() and see if they're used after cond_resched_rcu_lock().
> 
> Also, cond_resched_rcu_lock() will only drop a single level of RCU refs; so
> that should be easier still.

Don't get me wrong, I am not opposing cond_resched_rcu_lock() because it
will be difficult to validate.  For one thing, until there are a lot of
them, manual inspection is quite possible.  So feel free to apply my
Acked-by to the patch.

But it is definitely not too early to start thinking about how best to
automatically validate this sort of thing!

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - April 26, 2013, 6:26 p.m.
On Fri, 2013-04-26 at 10:48 -0700, Paul E. McKenney wrote:

> Don't get me wrong, I am not opposing cond_resched_rcu_lock() because it
> will be difficult to validate.  For one thing, until there are a lot of
> them, manual inspection is quite possible.  So feel free to apply my
> Acked-by to the patch.

One question : If some thread(s) is(are) calling rcu_barrier() and
waiting we exit from rcu_read_lock() section, is need_resched() enough
for allowing to break the section ?

If not, maybe we should not test need_resched() at all.

rcu_read_unlock();
cond_resched();
rcu_read_lock();



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney - April 26, 2013, 7:04 p.m.
On Fri, Apr 26, 2013 at 11:26:55AM -0700, Eric Dumazet wrote:
> On Fri, 2013-04-26 at 10:48 -0700, Paul E. McKenney wrote:
> 
> > Don't get me wrong, I am not opposing cond_resched_rcu_lock() because it
> > will be difficult to validate.  For one thing, until there are a lot of
> > them, manual inspection is quite possible.  So feel free to apply my
> > Acked-by to the patch.
> 
> One question : If some thread(s) is(are) calling rcu_barrier() and
> waiting we exit from rcu_read_lock() section, is need_resched() enough
> for allowing to break the section ?
> 
> If not, maybe we should not test need_resched() at all.
> 
> rcu_read_unlock();
> cond_resched();
> rcu_read_lock();

A call to rcu_barrier() only blocks on already-queued RCU callbacks, so if
there are no RCU callbacks queued in the system, it need not block at all.

But it might need to wait on some callbacks, and thus might need to
wait for a grace period.  So, is cond_resched() sufficient?
Currently, it depends:

1.	CONFIG_TINY_RCU: Here cond_resched() doesn't do anything unless
	there is at least one other process that is at and appropriate
	priority level.  So if the system has absolutely nothing else
	to do other than run the in-kernel loop containing the
	cond_resched_rcu_lock(), the grace period will never end.

	But as soon as some other process wakes up, there will be a
	context switch and the grace period will end.  Unless you
	are running at some high real-time priority, in which case
	either throttling kicks in after a second or so or you get
	what you deserve.  ;-)

	So for any reasonable workload, cond_resched() will eventually
	suffice.

2.	CONFIG_TREE_RCU without adaptive ticks (which is not yet in
	tree):  Same as #1, except that there is a greater chance
	that the eventual wakeup might happen on some other CPU.

3.	CONFIG_TREE_RCU with adaptive ticks (once it makes it into
	mainline):  After a new jiffies, RCU will kick the offending
	CPU, which will turn on the scheduling-clock interrupt.
	This won't end the grace period, but the kick could do a
	bit more if needed.

4.	CONFIG_TREE_PREEMPT_RCU:  When the next scheduling-clock
	interrupt notices that it happened in an RCU read-side
	critical section and that there is a grace period pending,
	it will set a flag in the task structure.  The next
	rcu_read_unlock() will report a quiescent state to the
	RCU core.

So perhaps RCU should do a bit more in cases #2 and #3.  It used to
send a resched IPI in this case, but if there is no reason to
reschedule, the resched IPI does nothing.  In the worst case, I
can fire up a prio 99 kthread on each CPU and send that kthread a
wakeup from RCU's rcu_gp_fqs() code.

Other thoughts?

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra - April 27, 2013, 7:18 a.m.
> In the worst case, I
> can fire up a prio 99 kthread on each CPU and send that kthread a
> wakeup from RCU's rcu_gp_fqs() code.

You just know that's going to be _so_ popular ;-)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Anastasov - April 27, 2013, 11:32 a.m.
Hello,

On Fri, 26 Apr 2013, Eric Dumazet wrote:

> On Fri, 2013-04-26 at 10:48 -0700, Paul E. McKenney wrote:
> 
> > Don't get me wrong, I am not opposing cond_resched_rcu_lock() because it
> > will be difficult to validate.  For one thing, until there are a lot of
> > them, manual inspection is quite possible.  So feel free to apply my
> > Acked-by to the patch.
> 
> One question : If some thread(s) is(are) calling rcu_barrier() and
> waiting we exit from rcu_read_lock() section, is need_resched() enough
> for allowing to break the section ?
> 
> If not, maybe we should not test need_resched() at all.
> 
> rcu_read_unlock();
> cond_resched();
> rcu_read_lock();

	So, I assume, to help realtime kernels and rcu_barrier
it is not a good idea to guard rcu_read_unlock with checks.
I see that rcu_read_unlock will try to reschedule in the 
!CONFIG_PREEMPT_RCU case (via preempt_enable), can we
use ifdefs to avoid double TIF_NEED_RESCHED check?:

	rcu_read_unlock();
#if !defined(CONFIG_PREEMPT) || defined(CONFIG_PREEMPT_RCU)
	cond_resched();
#endif
	rcu_read_lock();

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney - April 27, 2013, 4:17 p.m.
On Sat, Apr 27, 2013 at 09:18:15AM +0200, Peter Zijlstra wrote:
> > In the worst case, I
> > can fire up a prio 99 kthread on each CPU and send that kthread a
> > wakeup from RCU's rcu_gp_fqs() code.
> 
> You just know that's going to be _so_ popular ;-)

;-) ;-) ;-)

I must confess that I would prefer a somewhat less heavy-handed approach.

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney - April 27, 2013, 4:20 p.m.
On Sat, Apr 27, 2013 at 02:32:48PM +0300, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Fri, 26 Apr 2013, Eric Dumazet wrote:
> 
> > On Fri, 2013-04-26 at 10:48 -0700, Paul E. McKenney wrote:
> > 
> > > Don't get me wrong, I am not opposing cond_resched_rcu_lock() because it
> > > will be difficult to validate.  For one thing, until there are a lot of
> > > them, manual inspection is quite possible.  So feel free to apply my
> > > Acked-by to the patch.
> > 
> > One question : If some thread(s) is(are) calling rcu_barrier() and
> > waiting we exit from rcu_read_lock() section, is need_resched() enough
> > for allowing to break the section ?
> > 
> > If not, maybe we should not test need_resched() at all.
> > 
> > rcu_read_unlock();
> > cond_resched();
> > rcu_read_lock();
> 
> 	So, I assume, to help realtime kernels and rcu_barrier
> it is not a good idea to guard rcu_read_unlock with checks.
> I see that rcu_read_unlock will try to reschedule in the 
> !CONFIG_PREEMPT_RCU case (via preempt_enable), can we
> use ifdefs to avoid double TIF_NEED_RESCHED check?:
> 
> 	rcu_read_unlock();
> #if !defined(CONFIG_PREEMPT) || defined(CONFIG_PREEMPT_RCU)

I would instead suggest something like:

#ifndef CONFIG_PREEMPT_RCU

But yes, in the CONFIG_PREEMPT_RCU case, the cond_resched() is not
needed.

							Thanx, Paul

> 	cond_resched();
> #endif
> 	rcu_read_lock();
> 
> Regards
> 
> --
> Julian Anastasov <ja@ssi.bg>
> 

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Anastasov - April 29, 2013, 9:08 p.m.
Hello,

On Sat, 27 Apr 2013, Paul E. McKenney wrote:

> On Sat, Apr 27, 2013 at 02:32:48PM +0300, Julian Anastasov wrote:
> > 
> > 	So, I assume, to help realtime kernels and rcu_barrier
> > it is not a good idea to guard rcu_read_unlock with checks.
> > I see that rcu_read_unlock will try to reschedule in the 
> > !CONFIG_PREEMPT_RCU case (via preempt_enable), can we
> > use ifdefs to avoid double TIF_NEED_RESCHED check?:
> > 
> > 	rcu_read_unlock();
> > #if !defined(CONFIG_PREEMPT) || defined(CONFIG_PREEMPT_RCU)
> 
> I would instead suggest something like:
> 
> #ifndef CONFIG_PREEMPT_RCU
> 
> But yes, in the CONFIG_PREEMPT_RCU case, the cond_resched() is not
> needed.

	Hm, is this correct? If I follow the ifdefs
preempt_schedule is called when CONFIG_PREEMPT is
defined _and_ CONFIG_PREEMPT_RCU is not defined.
Your example for CONFIG_PREEMPT_RCU is the opposite to this?

> 							Thanx, Paul
> 
> > 	cond_resched();
> > #endif
> > 	rcu_read_lock();

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney - April 29, 2013, 9:30 p.m.
On Tue, Apr 30, 2013 at 12:08:18AM +0300, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Sat, 27 Apr 2013, Paul E. McKenney wrote:
> 
> > On Sat, Apr 27, 2013 at 02:32:48PM +0300, Julian Anastasov wrote:
> > > 
> > > 	So, I assume, to help realtime kernels and rcu_barrier
> > > it is not a good idea to guard rcu_read_unlock with checks.
> > > I see that rcu_read_unlock will try to reschedule in the 
> > > !CONFIG_PREEMPT_RCU case (via preempt_enable), can we
> > > use ifdefs to avoid double TIF_NEED_RESCHED check?:
> > > 
> > > 	rcu_read_unlock();
> > > #if !defined(CONFIG_PREEMPT) || defined(CONFIG_PREEMPT_RCU)
> > 
> > I would instead suggest something like:
> > 
> > #ifndef CONFIG_PREEMPT_RCU
> > 
> > But yes, in the CONFIG_PREEMPT_RCU case, the cond_resched() is not
> > needed.
> 
> 	Hm, is this correct? If I follow the ifdefs
> preempt_schedule is called when CONFIG_PREEMPT is
> defined _and_ CONFIG_PREEMPT_RCU is not defined.
> Your example for CONFIG_PREEMPT_RCU is the opposite to this?

Yep, I really did intend to say "#ifndef CONFIG_PREEMPT_RCU".

A couple of things to keep in mind:

1.	Although rcu_read_unlock() does map to preempt_enable() for
	CONFIG_TINY_RCU and CONFIG_TREE_RCU, the current Kconfig refuses
	to allow either CONFIG_TINY_RCU or CONFIG_TREE_RCU to be selected
	if CONFIG_PREEMPT=y.

2.	In the CONFIG_PREEMPT_RCU case, __rcu_read_unlock() will check
	to see if the RCU core needs to be informed, so there is no
	need to invoke cond_resched() in that case.

3.	If we drop your "|| defined(CONFIG_PREEMPT_RCU)", we get an
	almost-synonym for my "#ifndef CONFIG_PREEMPT_RCU".  The "almost"
	applies to older kernels due to the possibility of having a
	CONFIG_TINY_PREEMPT_RCU kernel -- but this possibility is going
	away soon.

Make sense?

							Thanx, Paul

> > > 	cond_resched();
> > > #endif
> > > 	rcu_read_lock();
> 
> Regards
> 
> --
> Julian Anastasov <ja@ssi.bg>
> 

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Anastasov - April 29, 2013, 11:12 p.m.
Hello,

On Mon, 29 Apr 2013, Paul E. McKenney wrote:

> On Tue, Apr 30, 2013 at 12:08:18AM +0300, Julian Anastasov wrote:
> > 
> > 	Hello,
> > 
> > On Sat, 27 Apr 2013, Paul E. McKenney wrote:
> > 
> > > I would instead suggest something like:
> > > 
> > > #ifndef CONFIG_PREEMPT_RCU
> > > 
> > > But yes, in the CONFIG_PREEMPT_RCU case, the cond_resched() is not
> > > needed.
> > 
> > 	Hm, is this correct? If I follow the ifdefs
> > preempt_schedule is called when CONFIG_PREEMPT is
> > defined _and_ CONFIG_PREEMPT_RCU is not defined.
> > Your example for CONFIG_PREEMPT_RCU is the opposite to this?
> 
> Yep, I really did intend to say "#ifndef CONFIG_PREEMPT_RCU".
> 
> A couple of things to keep in mind:
> 
> 1.	Although rcu_read_unlock() does map to preempt_enable() for
> 	CONFIG_TINY_RCU and CONFIG_TREE_RCU, the current Kconfig refuses
> 	to allow either CONFIG_TINY_RCU or CONFIG_TREE_RCU to be selected
> 	if CONFIG_PREEMPT=y.

	I see, CONFIG_PREEMPT_RCU depends on CONFIG_PREEMPT

> 2.	In the CONFIG_PREEMPT_RCU case, __rcu_read_unlock() will check
> 	to see if the RCU core needs to be informed, so there is no
> 	need to invoke cond_resched() in that case.

	OK

> 3.	If we drop your "|| defined(CONFIG_PREEMPT_RCU)", we get an
> 	almost-synonym for my "#ifndef CONFIG_PREEMPT_RCU".  The "almost"
> 	applies to older kernels due to the possibility of having a
> 	CONFIG_TINY_PREEMPT_RCU kernel -- but this possibility is going
> 	away soon.
> 
> Make sense?

	Yes, thanks for the explanation!

	Simon, so lets do it as suggested by Eric and Paul:

	rcu_read_unlock();
#ifndef CONFIG_PREEMPT_RCU
	cond_resched();
#endif
	rcu_read_lock();

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simon Horman - April 30, 2013, 2:45 a.m.
On Fri, Apr 26, 2013 at 08:15:34AM +0200, Ingo Molnar wrote:
> 
> * Simon Horman <horms@verge.net.au> wrote:
> 
> > This avoids the situation where a dump of a large number of connections
> > may prevent scheduling for a long time while also avoiding excessive
> > calls to rcu_read_unlock() and rcu_read_lock().
> > 
> > Cc: Eric Dumazet <eric.dumazet@gmail.com>
> > Cc: Julian Anastasov <ja@ssi.bg>
> > Signed-off-by: Simon Horman <horms@verge.net.au>
> > ---
> >  net/netfilter/ipvs/ip_vs_conn.c | 6 ++----
> >  1 file changed, 2 insertions(+), 4 deletions(-)
> > 
> > diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
> > index a083bda..42a7b33 100644
> > --- a/net/netfilter/ipvs/ip_vs_conn.c
> > +++ b/net/netfilter/ipvs/ip_vs_conn.c
> > @@ -975,8 +975,7 @@ static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)
> >  				return cp;
> >  			}
> >  		}
> > -		rcu_read_unlock();
> > -		rcu_read_lock();
> > +		cond_resched_rcu_lock();
> >  	}
> >  
> >  	return NULL;
> > @@ -1015,8 +1014,7 @@ static void *ip_vs_conn_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> >  			iter->l = &ip_vs_conn_tab[idx];
> >  			return cp;
> >  		}
> > -		rcu_read_unlock();
> > -		rcu_read_lock();
> > +		cond_resched_rcu_lock();
> 
> Feel free to route this via the networking tree.
> 
> Note that this change isn't a pure clean-up but has functional effects as 
> well: on !PREEMPT or PREEMPT_VOLUNTARY kernels it will add in a potential 
> cond_resched() - while previously it would only rcu unlock and relock 
> (which in itself isn't scheduling).
> 
> This should probably be pointed out in the changelog.

Thanks, I have added some text which I will include in v2.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index a083bda..42a7b33 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -975,8 +975,7 @@  static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)
 				return cp;
 			}
 		}
-		rcu_read_unlock();
-		rcu_read_lock();
+		cond_resched_rcu_lock();
 	}
 
 	return NULL;
@@ -1015,8 +1014,7 @@  static void *ip_vs_conn_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 			iter->l = &ip_vs_conn_tab[idx];
 			return cp;
 		}
-		rcu_read_unlock();
-		rcu_read_lock();
+		cond_resched_rcu_lock();
 	}
 	iter->l = NULL;
 	return NULL;