diff mbox

fix packet loss and massive ping spikes with PPP multi-link

Message ID 4BB31E00.8060204@netservers.co.uk
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Ben McKeegan March 31, 2010, 10:03 a.m. UTC
>>> Making it runtime per link selectable would be nicer but thats a bit more
>>> work.
>> Doesn't it work already via echoing values to 
>> /sys/module/ppp/generic/parameters/ml_explode in the above code?
> 
> Thats runtime (and why I set 0600 in the permissions for the example) but
> not per link.
> 

I needed to do something similar a while back and I took a very 
different approach, which I think is more flexible.   Rather than 
implement a new round-robin scheduler I simply introduced a target 
minimum fragment size into the fragment size calculation, as a per 
bundle parameter that can be configured via a new ioctl.  This modifies 
the algorithm so that it tries to limit the number of fragments such 
that each fragment is at least the minimum size.  If the minimum size is 
greater than the packet size it will not be fragmented all but will 
instead just get sent down the next available channel.

A pppd plugin generates the ioctl call allowing this to be tweaked per 
connection.  It is more flexible in that you can still have the larger 
packets fragmented if you wish.

We've used a variant of this patch on our ADSL LNS pool for a few years 
now with varying results.  We originally did it to save bandwidth as we 
have a per packet overhead and fragmenting tiny packets such as VoIP 
across a bundle of 4 lines made no sense at all.  We've experimented 
with higher minimum settings up to and above the link MTU, thus 
achieving the equivalent of Richard's patch.

In some cases this has improved performance, others it makes it worse. 
It depends a lot on the lines and traffic patterns, and it is certainly 
not a change we would wish to have on by default.  Any solution going 
into mainline kernel would need to be tunable per connection.  One of 
the issues seems to be with poor recovery from packet loss on low 
volume, highly delay sensitive traffic on large bundles of lines.  With 
Linux at both ends you are relying on received sequence numbers to 
detect loss.  When packets are being fragmented across all channels and 
a fragment is lost, the receiving system is able to spot the lost 
fragment fairly quickly.  Once you start sending some multilink frames 
down individual channels, it takes a lot longer for the receiver to 
notice the packet loss on an individual channel.  Until another fragment 
is successfully received on the lossy channel, the fragments of the 
incomplete frame sit in the queue clogging up the other channels (the 
receiver is attempting to preserve the original packet order and is 
still waiting for the lost fragment).

Original patch attached.   This almost certainly needs updating to take 
account of other more recent changes in multi link algorithm but it may 
provide some inspiration.

Regards,
Ben.

Comments

Paul Mackerras May 29, 2010, 2:16 a.m. UTC | #1
On Wed, Mar 31, 2010 at 11:03:44AM +0100, Ben McKeegan wrote:

> I needed to do something similar a while back and I took a very
> different approach, which I think is more flexible.   Rather than
> implement a new round-robin scheduler I simply introduced a target
> minimum fragment size into the fragment size calculation, as a per
> bundle parameter that can be configured via a new ioctl.  This
> modifies the algorithm so that it tries to limit the number of
> fragments such that each fragment is at least the minimum size.  If
> the minimum size is greater than the packet size it will not be
> fragmented all but will instead just get sent down the next
> available channel.
> 
> A pppd plugin generates the ioctl call allowing this to be tweaked
> per connection.  It is more flexible in that you can still have the
> larger packets fragmented if you wish.

I like this a lot better than the other proposed patch.  It adds less
code because it uses the fact that ppp_mp_explode() already has a
round-robin capability using the ppp->nxchan field, plus it provides a
way to control it per bundle via pppd.

If you fix up the indentation issues (2-space indent in some of the
added code -- if you're using emacs, set c-basic-offset to 8), I'll
ack it and hopefully DaveM will pick it up.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Richard Hartmann May 29, 2010, 9:06 a.m. UTC | #2
On Sat, May 29, 2010 at 04:16, Paul Mackerras <paulus@samba.org> wrote:

> I like this a lot better than the other proposed patch.

Not that it really matters, but we would be happy with either patch
as long as we can stop patching mainline :)


Richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Richard Hartmann May 31, 2010, 1:39 p.m. UTC | #3
On Sat, May 29, 2010 at 04:16, Paul Mackerras <paulus@samba.org> wrote:

> If you fix up the indentation issues (2-space indent in some of the
> added code -- if you're using emacs, set c-basic-offset to 8), I'll
> ack it and hopefully DaveM will pick it up.

If no one submits it by tonight, I will probably send a cleaned-up
version of the alternative patch. Not that I want to steal anyone's
laurels, mind. But having it in mainline is better than not having
it in mainline.


Richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben McKeegan May 31, 2010, 4:20 p.m. UTC | #4
Paul Mackerras wrote:

>> I needed to do something similar a while back and I took a very
>> different approach, which I think is more flexible.   Rather than
>> implement a new round-robin scheduler I simply introduced a target
>> minimum fragment size into the fragment size calculation, as a per
>> bundle parameter that can be configured via a new ioctl.  This
>> modifies the algorithm so that it tries to limit the number of
>> fragments such that each fragment is at least the minimum size.  If
>> the minimum size is greater than the packet size it will not be
>> fragmented all but will instead just get sent down the next
>> available channel.
>>
>> A pppd plugin generates the ioctl call allowing this to be tweaked
>> per connection.  It is more flexible in that you can still have the
>> larger packets fragmented if you wish.
> 
> I like this a lot better than the other proposed patch.  It adds less
> code because it uses the fact that ppp_mp_explode() already has a
> round-robin capability using the ppp->nxchan field, plus it provides a
> way to control it per bundle via pppd.
> 
> If you fix up the indentation issues (2-space indent in some of the
> added code -- if you're using emacs, set c-basic-offset to 8), I'll
> ack it and hopefully DaveM will pick it up.

Okay, I'll fix it up when I'm back at work Tuesday and submit (today is 
a UK public holiday) and also dig out and fix up the userspace code to 
go with it.

Ben.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Richard Hartmann June 1, 2010, 10:20 a.m. UTC | #5
On Sat, May 29, 2010 at 04:16, Paul Mackerras <paulus@samba.org> wrote:


> I'll
> ack it and hopefully DaveM will pick it up.

As 2.6.35-rc1 is out, does this mean that we are looking at 2.6.36 at the
earliest? Or could this make it in as it is

a) a bug fix
b) tested in the field
c) a small change


Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben McKeegan June 1, 2010, 11:18 a.m. UTC | #6
Richard Hartmann wrote:

> As 2.6.35-rc1 is out, does this mean that we are looking at 2.6.36 at the
> earliest? Or could this make it in as it is
> 
> a) a bug fix

This isn't really a bug fix.  Its a behavioural change to work around 
poor quality/mismatched underlying PPP channels.

Regards,
Ben.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Richard Hartmann June 1, 2010, 11:28 a.m. UTC | #7
On Tue, Jun 1, 2010 at 13:18, Ben McKeegan <ben@netservers.co.uk> wrote:

> This isn't really a bug fix.  Its a behavioural change to work around poor
> quality/mismatched underlying PPP channels.

Maybe not a bug in the Linux kernel itself, but certainly in the real world
that exists around Linux. Similar to how a change to a device driver that
is needed to work around broken hardware is a bug fix, imo.


RIchard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller June 1, 2010, 10:15 p.m. UTC | #8
From: Richard Hartmann <richih.mailinglist@gmail.com>
Date: Tue, 1 Jun 2010 13:28:59 +0200

> Maybe not a bug in the Linux kernel itself, but certainly in the real world
> that exists around Linux. Similar to how a change to a device driver that
> is needed to work around broken hardware is a bug fix, imo.

It's not the same situation at all.

It is easier to fix misconfigured products that exist because of
software and configurations than it is to fix a physical piece of
hardware.

So you could work around it if you wanted to.

I definitely don't see this as -stable material, as a result.  We will
push it to net-next-2.6 and it will thus hit 2.6.36 as previously
mentioned.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben McKeegan June 2, 2010, 2:55 p.m. UTC | #9
Ben McKeegan wrote:
> Paul Mackerras wrote:
> 
>>> I needed to do something similar a while back and I took a very
>>> different approach, which I think is more flexible.   Rather than
>>> implement a new round-robin scheduler I simply introduced a target
>>> minimum fragment size into the fragment size calculation, as a per
>>> bundle parameter that can be configured via a new ioctl.  This
>>> modifies the algorithm so that it tries to limit the number of
>>> fragments such that each fragment is at least the minimum size.  If
>>> the minimum size is greater than the packet size it will not be
>>> fragmented all but will instead just get sent down the next
>>> available channel.
>>>
>>> A pppd plugin generates the ioctl call allowing this to be tweaked
>>> per connection.  It is more flexible in that you can still have the
>>> larger packets fragmented if you wish.
>>
>> I like this a lot better than the other proposed patch.  It adds less
>> code because it uses the fact that ppp_mp_explode() already has a
>> round-robin capability using the ppp->nxchan field, plus it provides a
>> way to control it per bundle via pppd.
>>
>> If you fix up the indentation issues (2-space indent in some of the
>> added code -- if you're using emacs, set c-basic-offset to 8), I'll
>> ack it and hopefully DaveM will pick it up.
> 
> Okay, I'll fix it up when I'm back at work Tuesday and submit (today is 
> a UK public holiday) and also dig out and fix up the userspace code to 
> go with it.

Still working on this, updating the patch wasn't as trivial as I thought 
as it clashed with Gabriele Paoloni's ppp_mp_explode redesign.  However, 
while looking at this code I believe I have found a bug which might have 
been contributing to the poor performance the OP was experiencing.   For 
the case where channel speeds are unknown and there are more than 2 
channels it would miscalculate the fragment sizes so they are not 
balanced on the channels.

Patch for the bug to follow immediately.

Regards,
Ben.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Richard Hartmann Nov. 8, 2010, 2:05 p.m. UTC | #10
On Wed, Jun 2, 2010 at 16:55, Ben McKeegan <ben@netservers.co.uk> wrote:

> Still working on this, updating the patch wasn't as trivial as I thought as
> it clashed with Gabriele Paoloni's ppp_mp_explode redesign.  However, while
> looking at this code I believe I have found a bug which might have been
> contributing to the poor performance the OP was experiencing.   For the case
> where channel speeds are unknown and there are more than 2 channels it would
> miscalculate the fragment sizes so they are not balanced on the channels.
>
> Patch for the bug to follow immediately.

Is there any update on this? It's been quite some time since you last updated
on this issue.


Thanks a lot,
Richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Richard Hartmann Nov. 15, 2010, 12:07 p.m. UTC | #11
On Mon, Nov 8, 2010 at 15:05, Richard Hartmann
<richih.mailinglist@gmail.com> wrote:

> Is there any update on this? It's been quite some time since you last updated
> on this issue.

As it's been a week without any reply and as I know how stuff can
drown in more important work & projects, I am tentatively poking again
:)


RIchard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff -ubdr linux-2.6.16.16-l2tp/drivers/net/ppp_generic.c linux-2.6.16.16-l2tp-mppp/drivers/net/ppp_generic.c
--- linux-2.6.16.16-l2tp/drivers/net/ppp_generic.c	2006-05-11 02:56:24.000000000 +0100
+++ linux-2.6.16.16-l2tp-mppp/drivers/net/ppp_generic.c	2007-07-03 18:23:35.000000000 +0100
@@ -64,7 +64,7 @@ 
 
 #define MPHDRLEN	6	/* multilink protocol header length */
 #define MPHDRLEN_SSN	4	/* ditto with short sequence numbers */
-#define MIN_FRAG_SIZE	64
+#define MIN_FRAG_SIZE	256
 
 /*
  * An instance of /dev/ppp can be associated with either a ppp
@@ -120,6 +120,7 @@ 
 	unsigned long	last_recv;	/* jiffies when last pkt rcvd a0 */
 	struct net_device *dev;		/* network interface device a4 */
 #ifdef CONFIG_PPP_MULTILINK
+        int             minfragsize;    /* minimum size for a fragment */
 	int		nxchan;		/* next channel to send something on */
 	u32		nxseq;		/* next sequence number to send */
 	int		mrru;		/* MP: max reconst. receive unit */
@@ -767,6 +768,15 @@ 
 		ppp_recv_unlock(ppp);
 		err = 0;
 		break;
+
+	case PPPIOCSMINFRAG:
+	        if (get_user(val, p))
+	                break;
+	        ppp_recv_lock(ppp);
+	        ppp->minfragsize = val < 0 ? 0 : val;
+	        ppp_recv_unlock(ppp);
+	        err = 0;
+	        break;
 #endif /* CONFIG_PPP_MULTILINK */
 
 	default:
@@ -1254,7 +1264,7 @@ 
 	int len, fragsize;
 	int i, bits, hdrlen, mtu;
 	int flen;
-	int navail, nfree;
+	int navail, nfree, nfrag;
 	int nbigger;
 	unsigned char *p, *q;
 	struct list_head *list;
@@ -1285,7 +1295,7 @@ 
 	 * the channels are free.  This gives much better TCP
 	 * performance if we have a lot of channels.
 	 */
-	if (nfree == 0 || nfree < navail / 2)
+	if (nfree == 0 || (nfree < navail / 2 && ppp->minfragsize == 0))
 		return 0;	/* can't take now, leave it in xmit_pending */
 
 	/* Do protocol field compression (XXX this should be optional) */
@@ -1302,13 +1312,24 @@ 
 	 * how small they are (i.e. even 0 length) in order to minimize
 	 * the time that it will take to detect when a channel drops
 	 * a fragment.
+         * However, if ppp->minfragsize > 0 we try to avoid creating
+         * fragments smaller than ppp->minfragsize and thus do not
+         * always use all free channels
 	 */
+	if (ppp->minfragsize > 0) {
+	  nfrag= len / ppp->minfragsize;
+	  if (nfrag < 1)
+	        nfrag = 1;
+	  else if (nfrag > nfree)
+	        nfrag = nfree;
+	} else
+	        nfrag = nfree;
 	fragsize = len;
-	if (nfree > 1)
-		fragsize = DIV_ROUND_UP(fragsize, nfree);
+	if (nfrag > 1)
+	        fragsize = DIV_ROUND_UP(fragsize, nfrag);
 	/* nbigger channels get fragsize bytes, the rest get fragsize-1,
 	   except if nbigger==0, then they all get fragsize. */
-	nbigger = len % nfree;
+	nbigger = len % nfrag;
 
 	/* skip to the channel after the one we last used
 	   and start at that one */
@@ -1323,7 +1344,7 @@ 
 
 	/* create a fragment for each channel */
 	bits = B;
-	while (nfree > 0 || len > 0) {
+	while (len > 0 || (nfree > 0 && ppp->minfragsize == 0)) {
 		list = list->next;
 		if (list == &ppp->channels) {
 			i = 0;
@@ -1371,7 +1392,7 @@ 
 			mtu = 4;
 		if (flen > mtu)
 			flen = mtu;
-		if (flen == len && nfree == 0)
+		if (flen == len && (nfree == 0 || ppp->minfragsize != 0))
 			bits |= E;
 		frag = alloc_skb(flen + hdrlen + (flen == 0), GFP_ATOMIC);
 		if (frag == 0)
@@ -2435,6 +2456,7 @@ 
 	spin_lock_init(&ppp->rlock);
 	spin_lock_init(&ppp->wlock);
 #ifdef CONFIG_PPP_MULTILINK
+	ppp->minfragsize = MIN_FRAG_SIZE;
 	ppp->minseq = -1;
 	skb_queue_head_init(&ppp->mrq);
 #endif /* CONFIG_PPP_MULTILINK */
diff -ubdr linux-2.6.16.16-l2tp/include/linux/if_ppp.h linux-2.6.16.16-l2tp-mppp/include/linux/if_ppp.h
--- linux-2.6.16.16-l2tp/include/linux/if_ppp.h	2006-05-12 13:45:00.000000000 +0100
+++ linux-2.6.16.16-l2tp-mppp/include/linux/if_ppp.h	2007-07-03 18:15:27.000000000 +0100
@@ -162,6 +162,7 @@ 
 #define PPPIOCATTCHAN	_IOW('t', 56, int)	/* attach to ppp channel */
 #define PPPIOCGCHAN	_IOR('t', 55, int)	/* get ppp channel number */
 #define PPPIOCGL2TPSTATS _IOR('t', 54, struct pppol2tp_ioc_stats)
+#define PPPIOCSMINFRAG  _IOW('t', 53, int)  /* minimum fragment size */
 
 #define SIOCGPPPSTATS   (SIOCDEVPRIVATE + 0)
 #define SIOCGPPPVER     (SIOCDEVPRIVATE + 1)	/* NEVER change this!! */