PPPoE performance regression

Message ID	1339595401.11011.48.camel@shinybook.infradead.org
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> Message-ID: <1339595401.11011.48.camel@shinybook.infradead.org> Subject: Re: PPPoE performance regression From: David Woodhouse <dwmw2@infradead.org> To: Nathan Williams <nathan@traverse.com.au> Cc: Karl Hiramoto <karl@hiramoto.org>, "David S. Miller" <davem@davemloft.net>, netdev@vger.kernel.org, Paul Mackerras <paulus@samba.org>, John Crispin <blogic@openwrt.org> Date: Wed, 13 Jun 2012 14:50:01 +0100 In-Reply-To: <1339581421.11011.18.camel@shinybook.infradead.org> References: <1339143949.24571.72.camel@dualcore.traverse> <1339144110.13998.1.camel@i7.infradead.org> <1339144954.24571.80.camel@dualcore.traverse> <1339147045.13998.3.camel@i7.infradead.org> <1339289425.2661.27.camel@laptop> <1339317136.2851.54.camel@shinybook.infradead.org> <1339581421.11011.18.camel@shinybook.infradead.org> Content-Type: multipart/signed; micalg="sha1"; protocol="application/x-pkcs7-signature"; boundary="=-vpQ+FiUanU3VLdG0iFJB" Mime-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk

David Woodhouse June 13, 2012, 1:50 p.m. UTC

On Wed, 2012-06-13 at 10:57 +0100, David Woodhouse wrote:
> This doesn't look *so* evil... if the basic concept of using
> skb_orphan() and then setting our own destructor is OK, then I'll work
> out the rest of the details and do it for l2tp too.

Stupid dwmw2. With patch this time...

Benjamin LaHaise June 13, 2012, 3:55 p.m. UTC | #1

On Wed, Jun 13, 2012 at 02:50:01PM +0100, David Woodhouse wrote:
> On Wed, 2012-06-13 at 10:57 +0100, David Woodhouse wrote:
> > This doesn't look *so* evil... if the basic concept of using
> > skb_orphan() and then setting our own destructor is OK, then I'll work
> > out the rest of the details and do it for l2tp too.
> 
> Stupid dwmw2. With patch this time...

Does this actually work?  Could the skb not end up sitting on the receive 
queue of a user socket indefinitely, deferring all further transmits?  From 
an ISP point of view, PPPoE and L2TP are most typically used on links where 
the congestion point is not the local interface the packets are being pumped 
into (think of the vast majority of ethernet based DSL modems), and this 
kind of transmit overhead is a pure waste of CPU cycles.  The only solution 
that generically works in most PPPoE/L2TP situations is to shape outgoing 
traffic to the speed of limiting link.

Maybe there's a PPP extension that does flow control...

		-ben
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Woodhouse June 13, 2012, 4:11 p.m. UTC | #2

On Wed, 2012-06-13 at 11:55 -0400, Benjamin LaHaise wrote:
> Does this actually work?  Could the skb not end up sitting on the
> receive  queue of a user socket indefinitely, deferring all further
> transmits?  From an ISP point of view, 

I haven't tried it; only compiled it. Certainly, the similar approach in
PPPoATM in commit 9d02daf7 *does* work for limiting the bufferbloat and
keeping the queues under control. And it'll let me do BQL for PPPoA.

I'm looking at this from the client side, not the ISP side. And in that
case the local interface *is* the bottleneck. When it's a PPPoE over
br2684 interface and it's full, we should stop the PPP netdev from
spewing packets at it, rather than just dropping them.

On the ISP side if the skb ends up sitting on a receive queue of a user
socket, and nothing is servicing that socket, surely the transmits on
this channel weren't happening anyway?

Benjamin LaHaise June 13, 2012, 4:31 p.m. UTC | #3

On Wed, Jun 13, 2012 at 05:11:34PM +0100, David Woodhouse wrote:
> On Wed, 2012-06-13 at 11:55 -0400, Benjamin LaHaise wrote:
> > Does this actually work?  Could the skb not end up sitting on the
> > receive  queue of a user socket indefinitely, deferring all further
> > transmits?  From an ISP point of view, 
> 
> I haven't tried it; only compiled it. Certainly, the similar approach in
> PPPoATM in commit 9d02daf7 *does* work for limiting the bufferbloat and
> keeping the queues under control. And it'll let me do BQL for PPPoA.
> 
> I'm looking at this from the client side, not the ISP side. And in that
> case the local interface *is* the bottleneck. When it's a PPPoE over
> br2684 interface and it's full, we should stop the PPP netdev from
> spewing packets at it, rather than just dropping them.

I would contend that PPPoE over br2684 is not the common case.  The vast 
majority of users in client mode are going to be using PPPoE over an 
ethernet link to a DSL modem (or cable or wireless radios even).  Just look 
at what DSL modems are available for users in computer stores / what ISPs 
actually ship to their users.  Real ATM exposing devices are rare.

> On the ISP side if the skb ends up sitting on a receive queue of a user
> socket, and nothing is servicing that socket, surely the transmits on
> this channel weren't happening anyway?

True, but it's a design issue we've had to contend with elsewhere in the 
various tunnelling protocols.

Don't get me wrong: I am very much in favour of intelligent queue 
management, but this approach simply does not work for the vast majority 
of PPPoE users, while it adds overhead that will negatively impact access 
concentrators.  If you can somehow restrict the overhead to only impacting 
your use-case, that would be an improvement.

		-ben

David Laight June 13, 2012, 4:32 p.m. UTC | #4

> I would contend that PPPoE over br2684 is not the common case.  The
vast 
> majority of users in client mode are going to be using PPPoE over an 
> ethernet link to a DSL modem (or cable or wireless radios even).  Just
look 
> at what DSL modems are available for users in computer stores / what
ISPs 
> actually ship to their users.  Real ATM exposing devices are rare.

PPPoA is common in the UK.

	David


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Woodhouse June 13, 2012, 4:53 p.m. UTC | #5

On Wed, 2012-06-13 at 12:31 -0400, Benjamin LaHaise wrote:
> I would contend that PPPoE over br2684 is not the common case.  The vast 
> majority of users in client mode are going to be using PPPoE over an 
> ethernet link to a DSL modem (or cable or wireless radios even).  Just look 
> at what DSL modems are available for users in computer stores / what ISPs 
> actually ship to their users.  Real ATM exposing devices are rare.

I'm looking at the class of device on which OpenWRT runs. Linux is *on*
the router with the ADSL port, not connected to it via Ethernet.

I can buy lots of these in the shop. Anything that's an ADSL *router*
rather than *modem* is likely to be running, or at least capable of
running, Linux.

Admittedly, if you have access to the native ADSL interface then you'd
do a lot better to run PPPoA — but I already fixed this issue for PPPoA.
There are people in some parts of the world who are using PPPoEoA and
putting up with the resulting MTU issues because the *ISP* doesn't
support proper PPPoA.

And even if it *were* rare... this is the case that *should* work best,
where we have complete control of the hardware. There's no excuse for
the behaviour that we currently see with PPPoE on BR2684.

> > On the ISP side if the skb ends up sitting on a receive queue of a user
> > socket, and nothing is servicing that socket, surely the transmits on
> > this channel weren't happening anyway?
> 
> True, but it's a design issue we've had to contend with elsewhere in the 
> various tunnelling protocols.
> 
> Don't get me wrong: I am very much in favour of intelligent queue 
> management, but this approach simply does not work for the vast majority 
> of PPPoE users, while it adds overhead that will negatively impact access 
> concentrators. 

I think that's largely true of BQL in general, isn't it? That's OK; it's
a config option. I suspect if I make this accounting of PPPoE / L2TP
packets conditional on BQL (or perhaps on a separate config option
PPP_BQL) that ought to address your concern about the cases where you
don't need it?

David Woodhouse June 13, 2012, 4:59 p.m. UTC | #6

On Wed, 2012-06-13 at 17:32 +0100, David Laight wrote:
> > I would contend that PPPoE over br2684 is not the common case.  The vast 
> > majority of users in client mode are going to be using PPPoE over an 
> > ethernet link to a DSL modem (or cable or wireless radios even).  Just look 
> > at what DSL modems are available for users in computer stores / what ISPs 
> > actually ship to their users.  Real ATM exposing devices are rare.
> 
> PPPoA is common in the UK.

In the UK you tend to have the option of using PPPoA *or* PPPoE over
BR2684. The ISP's kit will handle both.

Ben's comment was about the *hardware*, though. If your "ADSL modem" is
a separate box which just bridges Ethernet to BR2684 on the ADSL link,
you're limited to using the PPPoE protocol over that.

Obviously, if you have a proper ADSL *router* and it's not just a PPP
bridge, then you can — and should — use PPPoA.

Benjamin LaHaise June 13, 2012, 5:21 p.m. UTC | #7

On Wed, Jun 13, 2012 at 05:53:03PM +0100, David Woodhouse wrote:
> I'm looking at the class of device on which OpenWRT runs. Linux is *on*
> the router with the ADSL port, not connected to it via Ethernet.

Ah, yes, that's a worthwhile pursuit.

> And even if it *were* rare... this is the case that *should* work best,
> where we have complete control of the hardware. There's no excuse for
> the behaviour that we currently see with PPPoE on BR2684.

*nod*

> I think that's largely true of BQL in general, isn't it? That's OK; it's
> a config option. I suspect if I make this accounting of PPPoE / L2TP
> packets conditional on BQL (or perhaps on a separate config option
> PPP_BQL) that ought to address your concern about the cases where you
> don't need it?

That would help.

On the whole question of PPPoE over intermediate ethernet links to ADSL 
modems, I think it would be possible to limit latency by implementing a 
sliding window clocked using LCP ECHO requests.  Does this sound worthwhile 
implementing?  What sort of queue depths are you looking at for the ATM 
devices you're working on?

		-ben

David Woodhouse June 13, 2012, 5:43 p.m. UTC | #8

On Wed, 2012-06-13 at 13:21 -0400, Benjamin LaHaise wrote:
> On the whole question of PPPoE over intermediate ethernet links to ADSL 
> modems, I think it would be possible to limit latency by implementing a 
> sliding window clocked using LCP ECHO requests.  Does this sound worthwhile 
> implementing?

Maybe, for someone who actually cares about those who are using separate
ADSL modems, and doesn't think they should just better hardware if they
care about the performance and queue management :)

>   What sort of queue depths are you looking at for the ATM devices
> you're working on? 

We're managing the queue to keep it short. On the PPPoATM side, we (now)
strictly limit the number of packets between the ppp_generic code and
the ATM device. It's basically one in-flight and one waiting to be
handed to the device in the TX done interrupt. PPP is designed to feed
new packets with low latency, after all.

The Solos ADSL device *used* to eat as many packets as it had internal
RAM to store them in, acknowledging the "transmit" as soon as it had
buffered them. I got Nathan to fix that some time ago, and the internal
queue is fairly short now.

Paul Mackerras June 14, 2012, 6:18 a.m. UTC | #9

On Wed, Jun 13, 2012 at 02:50:01PM +0100, David Woodhouse wrote:
> On Wed, 2012-06-13 at 10:57 +0100, David Woodhouse wrote:
> > This doesn't look *so* evil... if the basic concept of using
> > skb_orphan() and then setting our own destructor is OK, then I'll work
> > out the rest of the details and do it for l2tp too.
> 
> Stupid dwmw2. With patch this time...

> +static void pppoe_skb_destructor(struct sk_buff *skb)
> +{
> +	struct sock *sk = skb->sk;
> +	struct pppox_sock *po = pppox_sk(sk);
> +
> +	atomic_dec(&po->inflight);
> +	/* Schedule a call to ppp_output_wakeup(chan), if it was already blocked.
> +	   Mind for race conditions with another CPU which is in pppoe_xmit() 
> +	   right now. See commit 9d02daf7 in pppoatm. */
> +	sock_put(sk);
> +}

Umm, how does ppp_output_wakeup() actually get called?

Paul.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Woodhouse June 14, 2012, 6:49 a.m. UTC | #10

On Thu, 2012-06-14 at 16:18 +1000, Paul Mackerras wrote:
> Umm, how does ppp_output_wakeup() actually get called? 

Not directly from the destructor; it'll need to be from a tasklet like
the PPPoATM code does it. And the same race conditions, and the same
handling of them which makes me slightly uneasy as it depends quite
intimately on ppp_generic's internal locking scheme, will apply.

That's what I meant when I said I'd work out the rest of the details...

David Woodhouse June 14, 2012, 10:35 a.m. UTC | #11

On Thu, 2012-06-14 at 16:18 +1000, Paul Mackerras wrote:
> Umm, how does ppp_output_wakeup() actually get called?

In fact I'm thinking of eliminating ppp_output_wakeup() in the general
case.

The idea (and it is *just* an idea so far) is to introduce
ppp_sent_queue(), ppp_completed_queue() and ppp_reset_queue()¹ functions
which take a ppp_chan and map onto the corresponding netdev_* functions
for BQL.

Having done that, we should be able to trigger the wakeup automatically
from the ppp_completed_queue() function, and there's no need for channel
drivers to call ppp_output_wakeup() directly. Not only do we get proper
holistic queue length management, we also move the flow control into PPP
and get rid of the horrid dependency on internal PPP locking that's
documented in commit 9d02daf75², and which we'd have to address on the
PPPoX side too.

And the overhead that Ben is concerned about should be fairly minimal.

PPPoE performance regression

Commit Message

Comments

Patch