Patchwork [net-next,1/2] net: fix LLS debug_smp_processor_id() warning

login
register
mail settings
Submitter Eliezer Tamir
Date June 28, 2013, 12:59 p.m.
Message ID <20130628125926.14419.89905.stgit@ladj378.jer.intel.com>
Download mbox | patch
Permalink /patch/255362/
State Accepted
Delegated to: David Miller
Headers show

Comments

Eliezer Tamir - June 28, 2013, 12:59 p.m.
Our use of sched_clock is OK because we don't mind the side effects
of calling it and occasionally waking up on a different CPU.

When CONFIG_DEBUG_PREEMPT is on, disable preempt before calling
sched_clock() so we don't trigger a debug_smp_processor_id() warning.

Reported-by: Cody P Schafer <devel-lists@codyps.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 include/net/ll_poll.h |   30 +++++++++++++++++++++++++-----
 1 files changed, 25 insertions(+), 5 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings - June 28, 2013, 4:51 p.m.
On Fri, 2013-06-28 at 15:59 +0300, Eliezer Tamir wrote:
> Our use of sched_clock is OK because we don't mind the side effects
> of calling it and occasionally waking up on a different CPU.

Sure about that?  Jitter matters too.

> When CONFIG_DEBUG_PREEMPT is on, disable preempt before calling
> sched_clock() so we don't trigger a debug_smp_processor_id() warning.
[...]

I think this is papering over a problem.  The warning is there for a
good reason.

Would functions like these make it possible to use sched_clock() safely
for polling?  (I didn't spend much time thinking about the names.)

struct sched_timestamp {
	int cpu;
	unsigned long long clock;
};

static inline struct sched_timestamp sched_get_timestamp(void)
{
	struct sched_timestamp ret;

	preempt_disable_notrace();
	ret.cpu = smp_processor_id();
	ret.clock = sched_clock();
	preempt_enable_no_resched_notrace();

	return ret;
}

static inline bool sched_timeout_or_moved(struct sched_timestamp start,
					  unsigned long long timeout)
{
	bool ret;

	preempt_disable_notrace();
	ret = start.cpu != smp_processor_id() ||
		(sched_clock() - start.clock) > timeout;
	preempt_enable_no_resched_notrace();

	return ret;
}

Ben.
Eliezer Tamir - June 29, 2013, 6:50 p.m.
On 28/06/2013 19:51, Ben Hutchings wrote:
> On Fri, 2013-06-28 at 15:59 +0300, Eliezer Tamir wrote:
>> Our use of sched_clock is OK because we don't mind the side effects
>> of calling it and occasionally waking up on a different CPU.
>
> Sure about that?  Jitter matters too.
>
Pretty sure, this is a limit on how long we poll, it's for fairness to
the rest of the system not for performance of this code.

What matters is that on average you are bounded by something close to
what the user specified. If once in a while you run less because of
clock jitter, or even twice the specified time, it's no big deal.

So I don't see how jitter would matter.

And if your workload is jitter sensitive, you should probably be
pinning tasks to CPUs anyway.


>> When CONFIG_DEBUG_PREEMPT is on, disable preempt before calling
>> sched_clock() so we don't trigger a debug_smp_processor_id() warning.
> [...]
>
> I think this is papering over a problem.  The warning is there for a
> good reason.

I think we understand the warning, and that we are OK with the effects.

looking at how other users in the kernel solved this issue
It seems like this is what they do.
for example trace/ring_buffer.c:ring_buffer_time_Stamp()

Also kvm_clock_read() and xen_clokcsource_read() seem to disable preempt
just to silence this warning.

If they really cared about reading the value on one CPU, then being
scheduled on another they should have disabled interrupts.
or am I missing something?

> Would functions like these make it possible to use sched_clock() safely
> for polling?  (I didn't spend much time thinking about the names.)
>
> struct sched_timestamp {
> 	int cpu;
> 	unsigned long long clock;
> };
>
> static inline struct sched_timestamp sched_get_timestamp(void)
> {
> 	struct sched_timestamp ret;
>
> 	preempt_disable_notrace();
> 	ret.cpu = smp_processor_id();
> 	ret.clock = sched_clock();
> 	preempt_enable_no_resched_notrace();
>
> 	return ret;
> }

I don't understand, preempt_disable() only makes prevents preempt
from taking the CPU away from you, you could still lose it for
other reasons.
You would really need to disable interrupts in this region to be
sure that it all executed on the same CPU.

-Eliezer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings - July 1, 2013, 7:48 p.m.
On Sat, 2013-06-29 at 21:50 +0300, Eliezer Tamir wrote:
> On 28/06/2013 19:51, Ben Hutchings wrote:
> > On Fri, 2013-06-28 at 15:59 +0300, Eliezer Tamir wrote:
> >> Our use of sched_clock is OK because we don't mind the side effects
> >> of calling it and occasionally waking up on a different CPU.
> >
> > Sure about that?  Jitter matters too.
> >
> Pretty sure, this is a limit on how long we poll, it's for fairness to
> the rest of the system not for performance of this code.
> 
> What matters is that on average you are bounded by something close to
> what the user specified. If once in a while you run less because of
> clock jitter, or even twice the specified time, it's no big deal.

So what is the maximum time difference in sched_clock() values between
CPUs in the same system?

> So I don't see how jitter would matter.
> 
> And if your workload is jitter sensitive, you should probably be
> pinning tasks to CPUs anyway.

Yes, they should be pinned.  But I think a single task that isn't pinned
could poll for significantly longer than intended and the effect
wouldn't be limited to that one task.

> >> When CONFIG_DEBUG_PREEMPT is on, disable preempt before calling
> >> sched_clock() so we don't trigger a debug_smp_processor_id() warning.
> > [...]
> >
> > I think this is papering over a problem.  The warning is there for a
> > good reason.
> 
> I think we understand the warning, and that we are OK with the effects.
> 
> looking at how other users in the kernel solved this issue
> It seems like this is what they do.
> for example trace/ring_buffer.c:ring_buffer_time_Stamp()

Well, this seems to be debug information, not actually used for timing.

> Also kvm_clock_read() and xen_clokcsource_read() seem to disable preempt
> just to silence this warning.

I can't see what you're referring to.

> If they really cared about reading the value on one CPU, then being
> scheduled on another they should have disabled interrupts.
> or am I missing something?
> 
> > Would functions like these make it possible to use sched_clock() safely
> > for polling?  (I didn't spend much time thinking about the names.)
[...]
> I don't understand, preempt_disable() only makes prevents preempt
> from taking the CPU away from you, you could still lose it for
> other reasons.
> You would really need to disable interrupts in this region to be
> sure that it all executed on the same CPU.

Hmm, yes you're right.  Anyway, what I was trying to suggest was that
you would record the current CPU along with the initial timestamp, and
then abort low-latency polling if either the task is moved onto a
different CPU or the time limit was reached.  Checking the CPU first
means that the sched_clock() comparison is valid.

Ben.

Patch

diff --git a/include/net/ll_poll.h b/include/net/ll_poll.h
index 5bf2b3a..6d45e6f 100644
--- a/include/net/ll_poll.h
+++ b/include/net/ll_poll.h
@@ -37,20 +37,40 @@  extern unsigned int sysctl_net_ll_poll __read_mostly;
 #define LL_FLUSH_FAILED		-1
 #define LL_FLUSH_BUSY		-2
 
-/* we can use sched_clock() because we don't care much about precision
+/* a wrapper to make debug_smp_processor_id() happy
+ * we can use sched_clock() because we don't care much about precision
  * we only care that the average is bounded
- * we don't mind a ~2.5% imprecision so <<10 instead of *1000
+ */
+#ifdef CONFIG_DEBUG_PREEMPT
+static inline u64 ll_sched_clock(void)
+{
+	u64 rc;
+
+	preempt_disable_notrace();
+	rc = sched_clock();
+	preempt_enable_no_resched_notrace();
+
+	return rc;
+}
+#else /* CONFIG_DEBUG_PREEMPT */
+static inline u64 ll_sched_clock(void)
+{
+	return sched_clock();
+}
+#endif /* CONFIG_DEBUG_PREEMPT */
+
+/* we don't mind a ~2.5% imprecision so <<10 instead of *1000
  * sk->sk_ll_usec is a u_int so this can't overflow
  */
 static inline u64 ll_sk_end_time(struct sock *sk)
 {
-	return ((u64)ACCESS_ONCE(sk->sk_ll_usec) << 10) + sched_clock();
+	return ((u64)ACCESS_ONCE(sk->sk_ll_usec) << 10) + ll_sched_clock();
 }
 
 /* in poll/select we use the global sysctl_net_ll_poll value */
 static inline u64 ll_end_time(void)
 {
-	return ((u64)ACCESS_ONCE(sysctl_net_ll_poll) << 10) + sched_clock();
+	return ((u64)ACCESS_ONCE(sysctl_net_ll_poll) << 10) + ll_sched_clock();
 }
 
 static inline bool sk_valid_ll(struct sock *sk)
@@ -61,7 +81,7 @@  static inline bool sk_valid_ll(struct sock *sk)
 
 static inline bool can_poll_ll(u64 end_time)
 {
-	return !time_after64(sched_clock(), end_time);
+	return !time_after64(ll_sched_clock(), end_time);
 }
 
 /* when used in sock_poll() nonblock is known at compile time to be true