diff mbox

CPU scheduler to TXQ binding? (ixgbe vs. igb)

Message ID 1411054951.7106.272.camel@edumazet-glaptop2.roam.corp.google.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Sept. 18, 2014, 3:42 p.m. UTC
On Thu, 2014-09-18 at 06:41 -0700, Eric Dumazet wrote:

> Last but not least, there is the fact that networking stacks use
> mod_timer() to arm timers, and that by default, timer migration is on 
> ( cf /proc/sys/kernel/timer_migration )
> 
> We probably should use mod_timer_pinned(), but I could not really see
> any difference.

Hmm... actually its quite noticeable :

# ./super_netperf 500 --google-pacing-rate 3000000 -H lpaa24 -l 1000 &
...
# echo 1 >/proc/sys/kernel/timer_migration
# vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 2  0      0 261178336  15812 1001880    0    0     5     1  185  217  0  4 96  0
 0  0      0 261173456  15812 1001884    0    0     0     0 1548055 35472  0 15 85  0
 2  0      0 261174880  15812 1001888    0    0     0     0 1533309 35163  0 15 85  0
 3  0      0 261176768  15812 1001896    0    0     0     0 1533442 35694  0 15 85  0
 2  0      0 261173584  15812 1001912    0    0     0     3 1524024 35489  0 16 83  0
 3  0      0 261173344  15812 1001912    0    0     0     4 1525034 35392  0 15 85  0
 2  0      0 261175840  15812 1001920    0    0     0     0 1545652 35772  0 15 84  0
 3  0      0 261176800  15812 1001920    0    0     0     0 1513413 35703  0 15 85  0
 0  0      0 261175136  15812 1001920    0    0     0     2 1528775 35639  0 15 85  0
 1  0      0 261176480  15812 1001924    0    0     0     0 1510346 35364  0 15 85  0
 0  0      0 261174624  15812 1001924    0    0     0     0 1523893 35669  0 15 85  0
 0  0      0 261175568  15812 1001928    0    0     0     5 1524099 35605  0 15 85  0
 2  0      0 261175776  15812 1001932    0    0     0     5 1510481 35631  0 15 85  0
 2  0      0 261173776  15812 1001932    0    0     0     0 1528381 36127  0 15 84  0
 3  0      0 261175424  15812 1001932    0    0     0     0 1508722 35402  0 15 85  0
 1  0      0 261176048  15812 1001932    0    0     0     0 1495438 35280  0 15 85  0
^C
# echo 0 >/proc/sys/kernel/timer_migration
# vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 2  0      0 261172784  15812 1001936    0    0     5     1  165  228  0  5 95  0
 1  0      0 261175776  15812 1001940    0    0     0     0 1187446 32238  0 12 88  0
 2  0      0 261172752  15812 1001940    0    0     0     3 1166697 32060  0 12 88  0
 1  0      0 261174528  15812 1001944    0    0     0     3 1156846 32048  0 12 88  0
 1  0      0 261172688  15812 1001944    0    0     0     0 1152953 32048  0 12 88  0
 0  0      0 261169888  15812 1001952    0    0     0     0 1143630 32710  0 12 88  0
 2  0      0 261159936  15812 1001748    0    0     0  1016 1153256 32616  0 12 88  0
 2  0      0 261162128  15812 1001936    0    0     0     0 1153065 32689  0 12 88  0
 1  0      0 261171984  15812 1001936    0    0     0     3 1164407 32041  0 12 88  0
 2  0      0 261169552  15812 1001936    0    0     0     5 1162068 31917  0 12 88  0

I am tempted to simply :



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jesper Dangaard Brouer Sept. 18, 2014, 3:59 p.m. UTC | #1
On Thu, 18 Sep 2014 08:42:31 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Thu, 2014-09-18 at 06:41 -0700, Eric Dumazet wrote:
> 
> > Last but not least, there is the fact that networking stacks use
> > mod_timer() to arm timers, and that by default, timer migration is on 
> > ( cf /proc/sys/kernel/timer_migration )

I don't have this proc file on my system, as I didn't select CONFIG_SCHED_DEBUG.

> > We probably should use mod_timer_pinned(), but I could not really see
> > any difference.
> 
> Hmm... actually its quite noticeable :

Interesting impact.

I'm looking for some 1G hardware without multiqueue, so I can get
around this measurement constraint.  And possibly turning it down to
100Mbit/s, so I can more easily measure the HoL blocking effect.


> # ./super_netperf 500 --google-pacing-rate 3000000 -H lpaa24 -l 1000 &
> ...

Interesting option "--google-pacing-rate" ;-)

> # echo 1 >/proc/sys/kernel/timer_migration
> # vmstat 5
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  2  0      0 261178336  15812 1001880    0    0     5     1  185  217  0  4 96  0
>  0  0      0 261173456  15812 1001884    0    0     0     0 1548055 35472  0 15 85  0
>  2  0      0 261174880  15812 1001888    0    0     0     0 1533309 35163  0 15 85  0
>  3  0      0 261176768  15812 1001896    0    0     0     0 1533442 35694  0 15 85  0
[]

> # echo 0 >/proc/sys/kernel/timer_migration
> # vmstat 5
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  2  0      0 261172784  15812 1001936    0    0     5     1  165  228  0  5 95  0
>  1  0      0 261175776  15812 1001940    0    0     0     0 1187446 32238  0 12 88  0
>  2  0      0 261172752  15812 1001940    0    0     0     3 1166697 32060  0 12 88  0

Quite significant, both interrupts and especially CPU system usage drop.


> I am tempted to simply :
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 9c3f823e76a9..868c6bcd7221 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2288,10 +2288,10 @@ void sk_send_sigurg(struct sock *sk)
>  }
>  EXPORT_SYMBOL(sk_send_sigurg);
>  
> -void sk_reset_timer(struct sock *sk, struct timer_list* timer,
> +void sk_reset_timer(struct sock *sk, struct timer_list *timer,
>  		    unsigned long expires)
>  {
> -	if (!mod_timer(timer, expires))
> +	if (!mod_timer_pinned(timer, expires))
>  		sock_hold(sk);
>  }
>  EXPORT_SYMBOL(sk_reset_timer);
>
Eric Dumazet Sept. 18, 2014, 4:07 p.m. UTC | #2
On Thu, 2014-09-18 at 08:42 -0700, Eric Dumazet wrote:
> On Thu, 2014-09-18 at 06:41 -0700, Eric Dumazet wrote:
> 
> > Last but not least, there is the fact that networking stacks use
> > mod_timer() to arm timers, and that by default, timer migration is on 
> > ( cf /proc/sys/kernel/timer_migration )
> > 
> > We probably should use mod_timer_pinned(), but I could not really see
> > any difference.
> 
> Hmm... actually its quite noticeable :
> 
> # ./super_netperf 500 --google-pacing-rate 3000000 -H lpaa24 -l 1000 &
> ...
> # echo 1 >/proc/sys/kernel/timer_migration
> # vmstat 5
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  2  0      0 261178336  15812 1001880    0    0     5     1  185  217  0  4 96  0
>  0  0      0 261173456  15812 1001884    0    0     0     0 1548055 35472  0 15 85  0
>  2  0      0 261174880  15812 1001888    0    0     0     0 1533309 35163  0 15 85  0
>  3  0      0 261176768  15812 1001896    0    0     0     0 1533442 35694  0 15 85  0
>  2  0      0 261173584  15812 1001912    0    0     0     3 1524024 35489  0 16 83  0
>  3  0      0 261173344  15812 1001912    0    0     0     4 1525034 35392  0 15 85  0
>  2  0      0 261175840  15812 1001920    0    0     0     0 1545652 35772  0 15 84  0
>  3  0      0 261176800  15812 1001920    0    0     0     0 1513413 35703  0 15 85  0
>  0  0      0 261175136  15812 1001920    0    0     0     2 1528775 35639  0 15 85  0
>  1  0      0 261176480  15812 1001924    0    0     0     0 1510346 35364  0 15 85  0
>  0  0      0 261174624  15812 1001924    0    0     0     0 1523893 35669  0 15 85  0
>  0  0      0 261175568  15812 1001928    0    0     0     5 1524099 35605  0 15 85  0
>  2  0      0 261175776  15812 1001932    0    0     0     5 1510481 35631  0 15 85  0
>  2  0      0 261173776  15812 1001932    0    0     0     0 1528381 36127  0 15 84  0
>  3  0      0 261175424  15812 1001932    0    0     0     0 1508722 35402  0 15 85  0
>  1  0      0 261176048  15812 1001932    0    0     0     0 1495438 35280  0 15 85  0
> ^C
> # echo 0 >/proc/sys/kernel/timer_migration
> # vmstat 5
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  2  0      0 261172784  15812 1001936    0    0     5     1  165  228  0  5 95  0
>  1  0      0 261175776  15812 1001940    0    0     0     0 1187446 32238  0 12 88  0
>  2  0      0 261172752  15812 1001940    0    0     0     3 1166697 32060  0 12 88  0
>  1  0      0 261174528  15812 1001944    0    0     0     3 1156846 32048  0 12 88  0
>  1  0      0 261172688  15812 1001944    0    0     0     0 1152953 32048  0 12 88  0
>  0  0      0 261169888  15812 1001952    0    0     0     0 1143630 32710  0 12 88  0
>  2  0      0 261159936  15812 1001748    0    0     0  1016 1153256 32616  0 12 88  0
>  2  0      0 261162128  15812 1001936    0    0     0     0 1153065 32689  0 12 88  0
>  1  0      0 261171984  15812 1001936    0    0     0     3 1164407 32041  0 12 88  0
>  2  0      0 261169552  15812 1001936    0    0     0     5 1162068 31917  0 12 88  0
> 
> I am tempted to simply :
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 9c3f823e76a9..868c6bcd7221 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2288,10 +2288,10 @@ void sk_send_sigurg(struct sock *sk)
>  }
>  EXPORT_SYMBOL(sk_send_sigurg);
>  
> -void sk_reset_timer(struct sock *sk, struct timer_list* timer,
> +void sk_reset_timer(struct sock *sk, struct timer_list *timer,
>  		    unsigned long expires)
>  {
> -	if (!mod_timer(timer, expires))
> +	if (!mod_timer_pinned(timer, expires))
>  		sock_hold(sk);
>  }
>  EXPORT_SYMBOL(sk_reset_timer);
> 

And/or changing all occurences of HRTIMER_MODE_ABS in net/sched
into HRTIMER_MODE_ABS_PINNED

Because we _want_ qdisc being restarted on the right cpu for sure.





--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 18, 2014, 4:34 p.m. UTC | #3
On Thu, 2014-09-18 at 17:59 +0200, Jesper Dangaard Brouer wrote:
> On Thu, 18 Sep 2014 08:42:31 -0700
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > On Thu, 2014-09-18 at 06:41 -0700, Eric Dumazet wrote:
> > 
> > > Last but not least, there is the fact that networking stacks use
> > > mod_timer() to arm timers, and that by default, timer migration is on 
> > > ( cf /proc/sys/kernel/timer_migration )
> 
> I don't have this proc file on my system, as I didn't select CONFIG_SCHED_DEBUG.

Interesting... this timer_migration stuff seems a bit scary to me.

> 
> > > We probably should use mod_timer_pinned(), but I could not really see
> > > any difference.
> > 
> > Hmm... actually its quite noticeable :
> 
> Interesting impact.
> 
> I'm looking for some 1G hardware without multiqueue, so I can get
> around this measurement constraint.  And possibly turning it down to
> 100Mbit/s, so I can more easily measure the HoL blocking effect.
> 

ethtool -L   eth0    rx 1 tx 1 

(Or similar if combined is used)


> 
> > # ./super_netperf 500 --google-pacing-rate 3000000 -H lpaa24 -l 1000 &
> > ...
> 
> Interesting option "--google-pacing-rate" ;-)

Its using upstream SO_MAX_PACING_RATE, nothing fancy ;)

> 
> > # echo 1 >/proc/sys/kernel/timer_migration
> > # vmstat 5
> > procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
> >  2  0      0 261178336  15812 1001880    0    0     5     1  185  217  0  4 96  0
> >  0  0      0 261173456  15812 1001884    0    0     0     0 1548055 35472  0 15 85  0
> >  2  0      0 261174880  15812 1001888    0    0     0     0 1533309 35163  0 15 85  0
> >  3  0      0 261176768  15812 1001896    0    0     0     0 1533442 35694  0 15 85  0
> []
> 
> > # echo 0 >/proc/sys/kernel/timer_migration
> > # vmstat 5
> > procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
> >  2  0      0 261172784  15812 1001936    0    0     5     1  165  228  0  5 95  0
> >  1  0      0 261175776  15812 1001940    0    0     0     0 1187446 32238  0 12 88  0
> >  2  0      0 261172752  15812 1001940    0    0     0     3 1166697 32060  0 12 88  0
> 
> Quite significant, both interrupts and especially CPU system usage drop.
> 

Yep...


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesper Dangaard Brouer Sept. 18, 2014, 6:57 p.m. UTC | #4
On Thu, 18 Sep 2014 09:34:24 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2014-09-18 at 17:59 +0200, Jesper Dangaard Brouer wrote:
> > On Thu, 18 Sep 2014 08:42:31 -0700
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > 
[...]
> > I'm looking for some 1G hardware without multiqueue, so I can get
> > around this measurement constraint.  And possibly turning it down to
> > 100Mbit/s, so I can more easily measure the HoL blocking effect.
> > 
> 
> ethtool -L   eth0    rx 1 tx 1 
> 
> (Or similar if combined is used)

Thanks! - that solves my qdisc measurement problem :-)
And yes, I had to use:

 ethtool -L eth1 combined 1
diff mbox

Patch

diff --git a/net/core/sock.c b/net/core/sock.c
index 9c3f823e76a9..868c6bcd7221 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2288,10 +2288,10 @@  void sk_send_sigurg(struct sock *sk)
 }
 EXPORT_SYMBOL(sk_send_sigurg);
 
-void sk_reset_timer(struct sock *sk, struct timer_list* timer,
+void sk_reset_timer(struct sock *sk, struct timer_list *timer,
 		    unsigned long expires)
 {
-	if (!mod_timer(timer, expires))
+	if (!mod_timer_pinned(timer, expires))
 		sock_hold(sk);
 }
 EXPORT_SYMBOL(sk_reset_timer);