diff mbox series

[net-next,v12,3/7] sch_cake: Add optional ACK filter

Message ID 152650254614.25701.1377066681230937234.stgit@alrua-kau
State Changes Requested, archived
Delegated to: David Miller
Headers show
Series sched: Add Common Applications Kept Enhanced (cake) qdisc | expand

Commit Message

Toke Høiland-Jørgensen May 16, 2018, 8:29 p.m. UTC
The ACK filter is an optional feature of CAKE which is designed to improve
performance on links with very asymmetrical rate limits. On such links
(which are unfortunately quite prevalent, especially for DSL and cable
subscribers), the downstream throughput can be limited by the number of
ACKs capable of being transmitted in the *upstream* direction.

Filtering ACKs can, in general, have adverse effects on TCP performance
because it interferes with ACK clocking (especially in slow start), and it
reduces the flow's resiliency to ACKs being dropped further along the path.
To alleviate these drawbacks, the ACK filter in CAKE tries its best to
always keep enough ACKs queued to ensure forward progress in the TCP flow
being filtered. It does this by only filtering redundant ACKs. In its
default 'conservative' mode, the filter will always keep at least two
redundant ACKs in the queue, while in 'aggressive' mode, it will filter
down to a single ACK.

The ACK filter works by inspecting the per-flow queue on every packet
enqueue. Starting at the head of the queue, the filter looks for another
eligible packet to drop (so the ACK being dropped is always closer to the
head of the queue than the packet being enqueued). An ACK is eligible only
if it ACKs *fewer* cumulative bytes than the new packet being enqueued.
This prevents duplicate ACKs from being filtered (unless there is also SACK
options present), to avoid interfering with retransmission logic. In
aggressive mode, an eligible packet is always dropped, while in
conservative mode, at least two ACKs are kept in the queue. Only pure ACKs
(with no data segments) are considered eligible for dropping, but when an
ACK with data segments is enqueued, this can cause another pure ACK to
become eligible for dropping.

The approach described above ensures that this ACK filter avoids most of
the drawbacks of a naive filtering mechanism that only keeps flow state but
does not inspect the queue. This is the rationale for including the ACK
filter in CAKE itself rather than as separate module (as the TC filter, for
instance).

Our performance evaluation has shown that on a 30/1 Mbps link with a
bidirectional traffic test (RRUL), turning on the ACK filter on the
upstream link improves downstream throughput by ~20% (both modes) and
upstream throughput by ~12% in conservative mode and ~40% in aggressive
mode, at the cost of ~5ms of inter-flow latency due to the increased
congestion.

In *really* pathological cases, the effect can be a lot more; for instance,
the ACK filter increases the achievable downstream throughput on a link
with 100 Kbps in the upstream direction by an order of magnitude (from ~2.5
Mbps to ~25 Mbps).

Finally, even though we consider the ACK filter to be safer than most, we
do not recommend turning it on everywhere: on more symmetrical link
bandwidths the effect is negligible at best.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
---
 net/sched/sch_cake.c |  260 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 258 insertions(+), 2 deletions(-)

Comments

Eric Dumazet May 17, 2018, 11:04 a.m. UTC | #1
On 05/16/2018 01:29 PM, Toke Høiland-Jørgensen wrote:
> The ACK filter is an optional feature of CAKE which is designed to improve
> performance on links with very asymmetrical rate limits. On such links
> (which are unfortunately quite prevalent, especially for DSL and cable
> subscribers), the downstream throughput can be limited by the number of
> ACKs capable of being transmitted in the *upstream* direction.
> 

...

> 
> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
> ---
>  net/sched/sch_cake.c |  260 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 258 insertions(+), 2 deletions(-)
> 
>

I have decided to implement ACK compression in TCP stack itself.

First step is to take care of SACK, which are the main source of the bloat,
since we send one SACK for every incoming out-of-order packet.

These SACK are not only causing pain on the network, they also cause
the sender to send one MSS at a time (TSO auto defer is not engaged in this case),
thus starting to fill its RTX queue with pathological skbs (1-MSS each), increasing
processing time.

I see that your ACK filter does not take care of this common case :)

Doing the filtering in TCP has the immense advantage of knowing the RTT and thus be able
to use heuristics causing less damage.
Toke Høiland-Jørgensen May 17, 2018, 11:23 a.m. UTC | #2
Eric Dumazet <eric.dumazet@gmail.com> writes:

> On 05/16/2018 01:29 PM, Toke Høiland-Jørgensen wrote:
>> The ACK filter is an optional feature of CAKE which is designed to improve
>> performance on links with very asymmetrical rate limits. On such links
>> (which are unfortunately quite prevalent, especially for DSL and cable
>> subscribers), the downstream throughput can be limited by the number of
>> ACKs capable of being transmitted in the *upstream* direction.
>> 
>
> ...
>
>> 
>> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>> ---
>>  net/sched/sch_cake.c |  260 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 258 insertions(+), 2 deletions(-)
>> 
>>
>
> I have decided to implement ACK compression in TCP stack itself.

Awesome! Will look forward to seeing that!

> First step is to take care of SACK, which are the main source of the
> bloat, since we send one SACK for every incoming out-of-order packet.
>
> These SACK are not only causing pain on the network, they also cause
> the sender to send one MSS at a time (TSO auto defer is not engaged in
> this case), thus starting to fill its RTX queue with pathological skbs
> (1-MSS each), increasing processing time.
>
> I see that your ACK filter does not take care of this common case :)

We don't do full parsing of SACKs, no; we were trying to keep things
simple... We do detect the presence of SACK options, though, and the
presence of SACK options on an ACK will make previous ACKs be considered
redundant.

> Doing the filtering in TCP has the immense advantage of knowing the
> RTT and thus be able to use heuristics causing less damage.

Quite so. I'll be quite happy if the CAKE ACK filter can be delegated to
something only relevant for the poor sods stuck on proprietary operating
systems :)


Are you satisfied that the current version of the filter doesn't mangle
the skbs or crash the kernel?

-Toke
Eric Dumazet May 17, 2018, 11:56 a.m. UTC | #3
On 05/17/2018 04:23 AM, Toke Høiland-Jørgensen wrote:

> 
> We don't do full parsing of SACKs, no; we were trying to keep things
> simple... We do detect the presence of SACK options, though, and the
> presence of SACK options on an ACK will make previous ACKs be considered
> redundant.
> 

But they are not redundant in some cases, particularly when reorders happen in the network.
Toke Høiland-Jørgensen May 17, 2018, 1:11 p.m. UTC | #4
Eric Dumazet <eric.dumazet@gmail.com> writes:

> On 05/17/2018 04:23 AM, Toke Høiland-Jørgensen wrote:
>
>> 
>> We don't do full parsing of SACKs, no; we were trying to keep things
>> simple... We do detect the presence of SACK options, though, and the
>> presence of SACK options on an ACK will make previous ACKs be considered
>> redundant.
>> 
>
> But they are not redundant in some cases, particularly when reorders
> happen in the network.

Huh. I was under the impression that SACKs were basically cumulative
until cleared.

I.e., in packet sequence ABCDE where B and D are lost, C would have
SACK(B) and E would have SACK(B,D). Are you saying that E would only
have SACK(D)?

-Toke
Ryan Mounce May 18, 2018, 2:36 a.m. UTC | #5
On 17 May 2018 at 22:41, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> Eric Dumazet <eric.dumazet@gmail.com> writes:
>
>> On 05/17/2018 04:23 AM, Toke Høiland-Jørgensen wrote:
>>
>>>
>>> We don't do full parsing of SACKs, no; we were trying to keep things
>>> simple... We do detect the presence of SACK options, though, and the
>>> presence of SACK options on an ACK will make previous ACKs be considered
>>> redundant.
>>>
>>
>> But they are not redundant in some cases, particularly when reorders
>> happen in the network.
>
> Huh. I was under the impression that SACKs were basically cumulative
> until cleared.
>
> I.e., in packet sequence ABCDE where B and D are lost, C would have
> SACK(B) and E would have SACK(B,D). Are you saying that E would only
> have SACK(D)?

SACK works by acknowledging additional ranges above those that have
been ACKed, rather than ACKing up to the largest seen sequence number
and reporting missing ranges before that.

A - ACK(A)
B - lost
C - ACK(A) + SACK(C)
D - lost
E - ACK(A) + SACK(C, E)

Cake does check that the ACK sequence number is greater, or if it is
equal and the 'newer' ACK has the SACK option present. It doesn't
compare the sequence numbers inside two SACKs. If the two SACKs in the
above example had been reordered before reaching cake's ACK filter in
aggressive mode, the wrong one will be filtered.

This is a limitation of my naive SACK handling in cake. The default
'conservative' mode happens to mitigate the problem in the above
scenario, but the issue could still present itself in more
pathological cases. It's fixable, however I'm not sure this corner
case is sufficiently common or severe to warrant the extra complexity.

Ryan.
Eric Dumazet May 18, 2018, 4:08 a.m. UTC | #6
On 05/17/2018 07:36 PM, Ryan Mounce wrote:
> On 17 May 2018 at 22:41, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>> Eric Dumazet <eric.dumazet@gmail.com> writes:
>>
>>> On 05/17/2018 04:23 AM, Toke Høiland-Jørgensen wrote:
>>>
>>>>
>>>> We don't do full parsing of SACKs, no; we were trying to keep things
>>>> simple... We do detect the presence of SACK options, though, and the
>>>> presence of SACK options on an ACK will make previous ACKs be considered
>>>> redundant.
>>>>
>>>
>>> But they are not redundant in some cases, particularly when reorders
>>> happen in the network.
>>
>> Huh. I was under the impression that SACKs were basically cumulative
>> until cleared.
>>
>> I.e., in packet sequence ABCDE where B and D are lost, C would have
>> SACK(B) and E would have SACK(B,D). Are you saying that E would only
>> have SACK(D)?
> 
> SACK works by acknowledging additional ranges above those that have
> been ACKed, rather than ACKing up to the largest seen sequence number
> and reporting missing ranges before that.
> 
> A - ACK(A)
> B - lost
> C - ACK(A) + SACK(C)
> D - lost
> E - ACK(A) + SACK(C, E)
> 
> Cake does check that the ACK sequence number is greater, or if it is
> equal and the 'newer' ACK has the SACK option present. It doesn't
> compare the sequence numbers inside two SACKs. If the two SACKs in the
> above example had been reordered before reaching cake's ACK filter in
> aggressive mode, the wrong one will be filtered.
> 
> This is a limitation of my naive SACK handling in cake. The default
> 'conservative' mode happens to mitigate the problem in the above
> scenario, but the issue could still present itself in more
> pathological cases. It's fixable, however I'm not sure this corner
> case is sufficiently common or severe to warrant the extra complexity.

The extra complexity is absolutely requested for inclusion in upstream linux.

I recommend reading rfc 2018, whole section 4 (Generating Sack Options: Data Receiver Behavior
)

Proposed ACK filter in Cake is messing the protocol, since the first rule is not respected 

* The first SACK block (i.e., the one immediately following the
      kind and length fields in the option) MUST specify the contiguous
      block of data containing the segment which triggered this ACK,
      unless that segment advanced the Acknowledgment Number field in
      the header.  This assures that the ACK with the SACK option
      reflects the most recent change in the data receiver's buffer
      queue.


An ACK filter must either :

Not merge ACK if they contain different SACK blocks.

Or make a precise analysis of the SACK blocks to determine if the merge is allowed,
ie no useful information is lost.

The sender should get all the information as which segments were received correctly,
assuming no ACK are dropped because of congestion on return path.
Cong Wang May 18, 2018, 4:27 a.m. UTC | #7
On Thu, May 17, 2018 at 4:23 AM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> Eric Dumazet <eric.dumazet@gmail.com> writes:
>
>> On 05/16/2018 01:29 PM, Toke Høiland-Jørgensen wrote:
>>> The ACK filter is an optional feature of CAKE which is designed to improve
>>> performance on links with very asymmetrical rate limits. On such links
>>> (which are unfortunately quite prevalent, especially for DSL and cable
>>> subscribers), the downstream throughput can be limited by the number of
>>> ACKs capable of being transmitted in the *upstream* direction.
>>>
>>
>> ...
>>
>>>
>>> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>>> ---
>>>  net/sched/sch_cake.c |  260 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 258 insertions(+), 2 deletions(-)
>>>
>>>
>>
>> I have decided to implement ACK compression in TCP stack itself.
>
> Awesome! Will look forward to seeing that!

+1

It is really odd to put into a TC qdisc, TCP stack is a much better
place.
Ryan Mounce May 18, 2018, 7:43 a.m. UTC | #8
On 18 May 2018 at 13:38, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On 05/17/2018 07:36 PM, Ryan Mounce wrote:
>> On 17 May 2018 at 22:41, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>>> Eric Dumazet <eric.dumazet@gmail.com> writes:
>>>
>>>> On 05/17/2018 04:23 AM, Toke Høiland-Jørgensen wrote:
>>>>
>>>>>
>>>>> We don't do full parsing of SACKs, no; we were trying to keep things
>>>>> simple... We do detect the presence of SACK options, though, and the
>>>>> presence of SACK options on an ACK will make previous ACKs be considered
>>>>> redundant.
>>>>>
>>>>
>>>> But they are not redundant in some cases, particularly when reorders
>>>> happen in the network.
>>>
>>> Huh. I was under the impression that SACKs were basically cumulative
>>> until cleared.
>>>
>>> I.e., in packet sequence ABCDE where B and D are lost, C would have
>>> SACK(B) and E would have SACK(B,D). Are you saying that E would only
>>> have SACK(D)?
>>
>> SACK works by acknowledging additional ranges above those that have
>> been ACKed, rather than ACKing up to the largest seen sequence number
>> and reporting missing ranges before that.
>>
>> A - ACK(A)
>> B - lost
>> C - ACK(A) + SACK(C)
>> D - lost
>> E - ACK(A) + SACK(C, E)
>>
>> Cake does check that the ACK sequence number is greater, or if it is
>> equal and the 'newer' ACK has the SACK option present. It doesn't
>> compare the sequence numbers inside two SACKs. If the two SACKs in the
>> above example had been reordered before reaching cake's ACK filter in
>> aggressive mode, the wrong one will be filtered.
>>
>> This is a limitation of my naive SACK handling in cake. The default
>> 'conservative' mode happens to mitigate the problem in the above
>> scenario, but the issue could still present itself in more
>> pathological cases. It's fixable, however I'm not sure this corner
>> case is sufficiently common or severe to warrant the extra complexity.
>
> The extra complexity is absolutely requested for inclusion in upstream linux.
>
> I recommend reading rfc 2018, whole section 4 (Generating Sack Options: Data Receiver Behavior
> )
>
> Proposed ACK filter in Cake is messing the protocol, since the first rule is not respected
>
> * The first SACK block (i.e., the one immediately following the
>       kind and length fields in the option) MUST specify the contiguous
>       block of data containing the segment which triggered this ACK,
>       unless that segment advanced the Acknowledgment Number field in
>       the header.  This assures that the ACK with the SACK option
>       reflects the most recent change in the data receiver's buffer
>       queue.
>
>
> An ACK filter must either :
>
> Not merge ACK if they contain different SACK blocks.
>
> Or make a precise analysis of the SACK blocks to determine if the merge is allowed,
> ie no useful information is lost.
>
> The sender should get all the information as which segments were received correctly,
> assuming no ACK are dropped because of congestion on return path.

I have submitted a patch to cake's ACK filter, now it will only filter
an old SACK if its highest right block edge value is smaller than that
of the most recently received SACK. If there is reordering, neither
will be filtered.
Kevin Darbyshire-Bryant May 18, 2018, 11:18 a.m. UTC | #9
> On 18 May 2018, at 05:27, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> 
> On Thu, May 17, 2018 at 4:23 AM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>> Eric Dumazet <eric.dumazet@gmail.com> writes:
>> 
>>> On 05/16/2018 01:29 PM, Toke Høiland-Jørgensen wrote:
>>>> The ACK filter is an optional feature of CAKE which is designed to improve
>>>> performance on links with very asymmetrical rate limits. On such links
>>>> (which are unfortunately quite prevalent, especially for DSL and cable
>>>> subscribers), the downstream throughput can be limited by the number of
>>>> ACKs capable of being transmitted in the *upstream* direction.
>>>> 
>>> 
>>> ...
>>> 
>>>> 
>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>>>> ---
>>>> net/sched/sch_cake.c |  260 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 1 file changed, 258 insertions(+), 2 deletions(-)
>>>> 
>>>> 
>>> 
>>> I have decided to implement ACK compression in TCP stack itself.
>> 
>> Awesome! Will look forward to seeing that!
> 
> +1
> 
> It is really odd to put into a TC qdisc, TCP stack is a much better
> place.

Speaking as a user of cake’s ack filtering, although it may be an odd place, it is incredibly useful in my linux based home router middle box that usefully extracts extra usable bandwidth from my asymmetric link.  And whilst ack compression/reduction/filtering call it what you will, will come to the linux TCP stack, as yet other OS stacks are less enlightened and benefit from the router’s tweaking/meddling/interference.
Sebastian Moeller May 18, 2018, 11:23 a.m. UTC | #10
Hi Kevin,


> On May 18, 2018, at 13:18, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
> 
> 
> 
>> On 18 May 2018, at 05:27, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> 
>> On Thu, May 17, 2018 at 4:23 AM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>>> Eric Dumazet <eric.dumazet@gmail.com> writes:
>>> 
>>>> On 05/16/2018 01:29 PM, Toke Høiland-Jørgensen wrote:
>>>>> The ACK filter is an optional feature of CAKE which is designed to improve
>>>>> performance on links with very asymmetrical rate limits. On such links
>>>>> (which are unfortunately quite prevalent, especially for DSL and cable
>>>>> subscribers), the downstream throughput can be limited by the number of
>>>>> ACKs capable of being transmitted in the *upstream* direction.
>>>>> 
>>>> 
>>>> ...
>>>> 
>>>>> 
>>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>>>>> ---
>>>>> net/sched/sch_cake.c |  260 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> 1 file changed, 258 insertions(+), 2 deletions(-)
>>>>> 
>>>>> 
>>>> 
>>>> I have decided to implement ACK compression in TCP stack itself.
>>> 
>>> Awesome! Will look forward to seeing that!
>> 
>> +1
>> 
>> It is really odd to put into a TC qdisc, TCP stack is a much better
>> place.
> 
> Speaking as a user of cake’s ack filtering, although it may be an odd place, it is incredibly useful in my linux based home router middle box that usefully extracts extra usable bandwidth from my asymmetric link.  And whilst ack compression/reduction/filtering call it what you will, will come to the linux TCP stack, as yet other OS stacks are less enlightened and benefit from the router’s tweaking/meddling/interference.

	I believe this is a good point, it is really the asymmetry of the link that makes ACK suppression more or less desirable, and it is quite helpful if the adaptation to that link only needs to be configured on one device. I think this is similar to applying MSS clamping on a router to account for say PPPoE overhead as compared to relaying on path MTU discovery or having to configure the MTU on all end-points.



Best Regards



> 
> 
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
Eric Dumazet May 18, 2018, 3:20 p.m. UTC | #11
On 05/18/2018 04:18 AM, Kevin Darbyshire-Bryant wrote:
> 

> Speaking as a user of cake’s ack filtering, although it may be an odd place, it is incredibly useful in my linux based home router middle box that usefully extracts extra usable bandwidth from my asymmetric link.  And whilst ack compression/reduction/filtering call it what you will, will come to the linux TCP stack, as yet other OS stacks are less enlightened and benefit from the router’s tweaking/meddling/interference.
> 
> 

Kevin, there is no doubt the feature might be useful.

But it has to be properly implemented, or we risk to block future TCP improvements.

An ACK filter MUST parse the TCP options.

1) If some options can not be understood, ACK MUST not be dropped.

2) SACK options MUST be carefully analyzed

3) Timestamps also need some care.

Too many middle-boxes are breaking TCP, we do not want to carry another TCP blackhole in upstream linux,
that would be a shame.

conntrack had bugs that seriously impacted TCP Fastopen deployment for example.
diff mbox series

Patch

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index d515f18f8460..65439b643c92 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -755,6 +755,239 @@  static void flow_queue_add(struct cake_flow *flow, struct sk_buff *skb)
 	skb->next = NULL;
 }
 
+static struct iphdr *cake_get_iphdr(const struct sk_buff *skb,
+				    struct ipv6hdr *buf)
+{
+	unsigned int offset = skb_network_offset(skb);
+	struct iphdr *iph;
+
+	iph = skb_header_pointer(skb, offset, sizeof(struct iphdr), buf);
+
+	if (!iph)
+		return NULL;
+
+	if (iph->version == 4 && iph->protocol == IPPROTO_IPV6)
+		return skb_header_pointer(skb, offset + iph->ihl * 4,
+					  sizeof(struct ipv6hdr), buf);
+
+	else if (iph->version == 4)
+		return iph;
+
+	else if (iph->version == 6)
+		return skb_header_pointer(skb, offset, sizeof(struct ipv6hdr),
+					  buf);
+
+	return NULL;
+}
+
+static struct tcphdr *cake_get_tcphdr(const struct sk_buff *skb,
+				      void *buf, unsigned int bufsize)
+{
+	unsigned int offset = skb_network_offset(skb);
+	const struct ipv6hdr *ipv6h;
+	const struct tcphdr *tcph;
+	const struct iphdr *iph;
+	struct ipv6hdr _ipv6h;
+	struct tcphdr _tcph;
+
+	ipv6h = skb_header_pointer(skb, offset, sizeof(_ipv6h), &_ipv6h);
+
+	if (!ipv6h)
+		return NULL;
+
+	if (ipv6h->version == 4) {
+		iph = (struct iphdr *)ipv6h;
+		offset += iph->ihl * 4;
+
+		/* special-case 6in4 tunnelling, as that is a common way to get
+		 * v6 connectivity in the home
+		 */
+		if (iph->protocol == IPPROTO_IPV6) {
+			ipv6h = skb_header_pointer(skb, offset,
+						   sizeof(_ipv6h), &_ipv6h);
+
+			if (!ipv6h || ipv6h->nexthdr != IPPROTO_TCP)
+				return NULL;
+
+			offset += sizeof(struct ipv6hdr);
+
+		} else if (iph->protocol != IPPROTO_TCP) {
+			return NULL;
+		}
+
+	} else if (ipv6h->version == 6) {
+		if (ipv6h->nexthdr != IPPROTO_TCP)
+			return NULL;
+
+		offset += sizeof(struct ipv6hdr);
+	} else {
+		return NULL;
+	}
+
+	tcph = skb_header_pointer(skb, offset, sizeof(_tcph), &_tcph);
+	if (!tcph)
+		return NULL;
+
+	return skb_header_pointer(skb, offset,
+				  min(__tcp_hdrlen(tcph), bufsize), buf);
+}
+
+static bool cake_tcph_is_sack(const struct tcphdr *tcph)
+{
+	/* inspired by tcp_parse_options in tcp_input.c */
+	int length = __tcp_hdrlen(tcph) - sizeof(struct tcphdr);
+	const u8 *ptr = (const u8 *)(tcph + 1);
+
+	while (length > 0) {
+		int opcode = *ptr++;
+		int opsize;
+
+		if (opcode == TCPOPT_EOL)
+			break;
+		if (opcode == TCPOPT_NOP) {
+			length--;
+			continue;
+		}
+		opsize = *ptr++;
+		if (opsize < 2 || opsize > length)
+			break;
+		if (opcode == TCPOPT_SACK)
+			return true;
+		ptr += opsize - 2;
+		length -= opsize;
+	}
+
+	return false;
+}
+
+static struct sk_buff *cake_ack_filter(struct cake_sched_data *q,
+				       struct cake_flow *flow)
+{
+	bool aggressive = q->ack_filter == CAKE_ACK_AGGRESSIVE;
+	struct sk_buff *elig_ack = NULL, *elig_ack_prev = NULL;
+	struct sk_buff *skb_check, *skb_prev = NULL;
+	const struct ipv6hdr *ipv6h, *ipv6h_check;
+	const struct tcphdr *tcph, *tcph_check;
+	const struct iphdr *iph, *iph_check;
+	struct ipv6hdr _iph, _iph_check;
+	const struct sk_buff *skb;
+	struct tcphdr _tcph_check;
+	int seglen, num_found = 0;
+	unsigned char _tcph[64]; /* need to hold maximum hdr size */
+
+	/* no other possible ACKs to filter */
+	if (flow->head == flow->tail)
+		return NULL;
+
+	skb = flow->tail;
+	tcph = cake_get_tcphdr(skb, _tcph, sizeof(_tcph));
+	iph = cake_get_iphdr(skb, &_iph);
+	if (!tcph)
+		return NULL;
+
+	/* the 'triggering' packet need only have the ACK flag set.
+	 * also check that SYN is not set, as there won't be any previous ACKs.
+	 */
+	if ((tcp_flag_word(tcph) &
+	     (TCP_FLAG_ACK | TCP_FLAG_SYN)) != TCP_FLAG_ACK)
+		return NULL;
+
+	/* the 'triggering' ACK is at the tail of the queue, we have already
+	 * returned if it is the only packet in the flow. loop through the rest
+	 * of the queue looking for pure ACKs with the same 5-tuple as the
+	 * triggering one.
+	 */
+	for (skb_check = flow->head;
+	     skb_check && skb_check != skb;
+	     skb_prev = skb_check, skb_check = skb_check->next) {
+		iph_check = cake_get_iphdr(skb_check, &_iph_check);
+		tcph_check = cake_get_tcphdr(skb_check, &_tcph_check,
+					     sizeof(_tcph_check));
+
+		/* only TCP packets with matching 5-tuple are eligible */
+		if (!tcph_check || iph->version != iph_check->version ||
+		    tcph_check->source != tcph->source ||
+		    tcph_check->dest != tcph->dest)
+			continue;
+
+		if (iph_check->version == 4) {
+			if (iph_check->saddr != iph->saddr ||
+			    iph_check->daddr != iph->daddr)
+				continue;
+
+			seglen = ntohs(iph_check->tot_len) -
+				       (4 * iph_check->ihl);
+		} else if (iph_check->version == 6) {
+			ipv6h = (struct ipv6hdr *)iph;
+			ipv6h_check = (struct ipv6hdr *)iph_check;
+
+			if (ipv6_addr_cmp(&ipv6h_check->saddr, &ipv6h->saddr) ||
+			    ipv6_addr_cmp(&ipv6h_check->daddr, &ipv6h->daddr))
+				continue;
+
+			seglen = ntohs(ipv6h_check->payload_len);
+		} else {
+			WARN_ON(1);  /* shouldn't happen */
+			continue;
+		}
+
+		/* stricter criteria apply to ACKs that we may filter
+		 * 3 reserved flags must be unset to avoid future breakage
+		 * ECE/CWR/NS can be safely ignored
+		 * ACK must be set
+		 * All other flags URG/PSH/RST/SYN/FIN must be unset
+		 * 0x0FFF0000 = all TCP flags (confirm ACK=1, others zero)
+		 * 0x01C00000 = NS/CWR/ECE (safe to ignore)
+		 * 0x0E3F0000 = 0x0FFF0000 & ~0x01C00000
+		 * must be 'pure' ACK, contain zero bytes of segment data
+		 * options are ignored
+		 */
+		if (((tcp_flag_word(tcph_check) &
+			   cpu_to_be32(0x0E3F0000)) != TCP_FLAG_ACK) ||
+			 ((seglen - __tcp_hdrlen(tcph_check)) != 0))
+			continue;
+
+		/* The triggering packet must ACK more data than the ACK under
+		 * consideration, either because is has a strictly higher ACK
+		 * sequence number or because it is a SACK
+		 */
+		if ((ntohl(tcph_check->ack_seq) == ntohl(tcph->ack_seq) &&
+		     !cake_tcph_is_sack(tcph)) ||
+		    (int32_t)(ntohl(tcph_check->ack_seq) -
+			      ntohl(tcph->ack_seq)) > 0)
+			continue;
+
+		/* At this point we have found an eligible pure ACK to drop; if
+		 * we are in aggressive mode, we are done. Otherwise, keep
+		 * searching unless this is the second eligible ACK we
+		 * found.
+		 *
+		 * Since we want to drop ACK closest to the head of the queue,
+		 * save the first eligible ACK we find, even if we need to loop
+		 * again.
+		 */
+		if (!elig_ack) {
+			elig_ack = skb_check;
+			elig_ack_prev = skb_prev;
+		}
+
+		if (num_found++ > 0 || aggressive)
+			goto found;
+	}
+
+	return NULL;
+
+found:
+	if (elig_ack_prev)
+		elig_ack_prev->next = elig_ack->next;
+	else
+		flow->head = elig_ack->next;
+
+	elig_ack->next = NULL;
+
+	return elig_ack;
+}
+
 static u64 cake_ewma(u64 avg, u64 sample, u32 shift)
 {
 	avg -= avg >> shift;
@@ -931,6 +1164,7 @@  static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	struct cake_sched_data *q = qdisc_priv(sch);
 	int len = qdisc_pkt_len(skb);
 	int uninitialized_var(ret);
+	struct sk_buff *ack = NULL;
 	ktime_t now = ktime_get();
 	struct cake_tin_data *b;
 	struct cake_flow *flow;
@@ -976,8 +1210,24 @@  static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	cobalt_set_enqueue_time(skb, now);
 	flow_queue_add(flow, skb);
 
-	sch->q.qlen++;
-	q->buffer_used      += skb->truesize;
+	if (q->ack_filter)
+		ack = cake_ack_filter(q, flow);
+
+	if (ack) {
+		b->ack_drops++;
+		sch->qstats.drops++;
+		b->bytes += qdisc_pkt_len(ack);
+		len -= qdisc_pkt_len(ack);
+		q->buffer_used += skb->truesize - ack->truesize;
+		if (q->rate_flags & CAKE_FLAG_INGRESS)
+			cake_advance_shaper(q, b, ack, now, true);
+
+		qdisc_tree_reduce_backlog(sch, 1, qdisc_pkt_len(ack));
+		consume_skb(ack);
+	} else {
+		sch->q.qlen++;
+		q->buffer_used      += skb->truesize;
+	}
 
 	/* stats */
 	b->packets++;
@@ -1505,6 +1755,9 @@  static int cake_change(struct Qdisc *sch, struct nlattr *opt,
 			q->rate_flags &= ~CAKE_FLAG_INGRESS;
 	}
 
+	if (tb[TCA_CAKE_ACK_FILTER])
+		q->ack_filter = nla_get_u32(tb[TCA_CAKE_ACK_FILTER]);
+
 	if (tb[TCA_CAKE_MEMORY])
 		q->buffer_config_limit = nla_get_u32(tb[TCA_CAKE_MEMORY]);
 
@@ -1636,6 +1889,9 @@  static int cake_dump(struct Qdisc *sch, struct sk_buff *skb)
 			!!(q->rate_flags & CAKE_FLAG_INGRESS)))
 		goto nla_put_failure;
 
+	if (nla_put_u32(skb, TCA_CAKE_ACK_FILTER, q->ack_filter))
+		goto nla_put_failure;
+
 	return nla_nest_end(skb, opts);
 
 nla_put_failure: