diff mbox series

[net] sch_netem: fix skb leak in netem_enqueue()

Message ID 1520014608-512-1-git-send-email-alexey.kodanev@oracle.com
State Superseded, archived
Delegated to: David Miller
Headers show
Series [net] sch_netem: fix skb leak in netem_enqueue() | expand

Commit Message

Alexey Kodanev March 2, 2018, 6:16 p.m. UTC
When we exceed current packets limit and have more than one
segment in the list returned by skb_gso_segment(), netem drops
only the first one, skipping the rest, hence kmemleak reports:

unreferenced object 0xffff880b5d23b600 (size 1024):
  comm "softirq", pid 0, jiffies 4384527763 (age 2770.629s)
  hex dump (first 32 bytes):
    00 80 23 5d 0b 88 ff ff 00 00 00 00 00 00 00 00  ..#]............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000d8a19b9d>] __alloc_skb+0xc9/0x520
    [<000000001709b32f>] skb_segment+0x8c8/0x3710
    [<00000000c7b9bb88>] tcp_gso_segment+0x331/0x1830
    [<00000000c921cba1>] inet_gso_segment+0x476/0x1370
    [<000000008b762dd4>] skb_mac_gso_segment+0x1f9/0x510
    [<000000002182660a>] __skb_gso_segment+0x1dd/0x620
    [<00000000412651b9>] netem_enqueue+0x1536/0x2590 [sch_netem]
    [<0000000005d3b2a9>] __dev_queue_xmit+0x1167/0x2120
    [<00000000fc5f7327>] ip_finish_output2+0x998/0xf00
    [<00000000d309e9d3>] ip_output+0x1aa/0x2c0
    [<000000007ecbd3a4>] tcp_transmit_skb+0x18db/0x3670
    [<0000000042d2a45f>] tcp_write_xmit+0x4d4/0x58c0
    [<0000000056a44199>] tcp_tasklet_func+0x3d9/0x540
    [<0000000013d06d02>] tasklet_action+0x1ca/0x250
    [<00000000fcde0b8b>] __do_softirq+0x1b4/0x5a3
    [<00000000e7ed027c>] irq_exit+0x1e2/0x210

Fix it by adding the rest of the segments, if any, to skb
'to_free' list in that case.

Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
---
 net/sched/sch_netem.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Stephen Hemminger March 2, 2018, 6:44 p.m. UTC | #1
On Fri,  2 Mar 2018 21:16:48 +0300
Alexey Kodanev <alexey.kodanev@oracle.com> wrote:

> When we exceed current packets limit and have more than one
> segment in the list returned by skb_gso_segment(), netem drops
> only the first one, skipping the rest, hence kmemleak reports:
> 
> unreferenced object 0xffff880b5d23b600 (size 1024):
>   comm "softirq", pid 0, jiffies 4384527763 (age 2770.629s)
>   hex dump (first 32 bytes):
>     00 80 23 5d 0b 88 ff ff 00 00 00 00 00 00 00 00  ..#]............
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<00000000d8a19b9d>] __alloc_skb+0xc9/0x520
>     [<000000001709b32f>] skb_segment+0x8c8/0x3710
>     [<00000000c7b9bb88>] tcp_gso_segment+0x331/0x1830
>     [<00000000c921cba1>] inet_gso_segment+0x476/0x1370
>     [<000000008b762dd4>] skb_mac_gso_segment+0x1f9/0x510
>     [<000000002182660a>] __skb_gso_segment+0x1dd/0x620
>     [<00000000412651b9>] netem_enqueue+0x1536/0x2590 [sch_netem]
>     [<0000000005d3b2a9>] __dev_queue_xmit+0x1167/0x2120
>     [<00000000fc5f7327>] ip_finish_output2+0x998/0xf00
>     [<00000000d309e9d3>] ip_output+0x1aa/0x2c0
>     [<000000007ecbd3a4>] tcp_transmit_skb+0x18db/0x3670
>     [<0000000042d2a45f>] tcp_write_xmit+0x4d4/0x58c0
>     [<0000000056a44199>] tcp_tasklet_func+0x3d9/0x540
>     [<0000000013d06d02>] tasklet_action+0x1ca/0x250
>     [<00000000fcde0b8b>] __do_softirq+0x1b4/0x5a3
>     [<00000000e7ed027c>] irq_exit+0x1e2/0x210
> 
> Fix it by adding the rest of the segments, if any, to skb
> 'to_free' list in that case.
> 
> Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue")
> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
> ---
>  net/sched/sch_netem.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index 7c179ad..a5023a2 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -508,8 +508,14 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>  			1<<(prandom_u32() % 8);
>  	}
>  
> -	if (unlikely(sch->q.qlen >= sch->limit))
> +	if (unlikely(sch->q.qlen >= sch->limit)) {
> +		while (segs) {
> +			skb2 = segs->next;
> +			__qdisc_drop(segs, to_free);
> +			segs = skb2;
> +		}
>  		return qdisc_drop(skb, sch, to_free);
> +	}
>  
>  	qdisc_qstats_backlog_inc(sch, skb);
>  

Since this is a generic problem why is not fixed in qdisc_drop instead?
Eric Dumazet March 3, 2018, 3:30 a.m. UTC | #2
On Fri, 2018-03-02 at 10:44 -0800, Stephen Hemminger wrote:
> On Fri,  2 Mar 2018 21:16:48 +0300
> 
> Since this is a generic problem why is not fixed in qdisc_drop instead?

AFAIK only netem and tbf might segment GSO packets so far.

I am not sure we want to add code in qdisc_drop() that is used under
stress on normal qdiscs and are inline code.
Neil Horman March 3, 2018, 12:20 p.m. UTC | #3
On Fri, Mar 02, 2018 at 09:16:48PM +0300, Alexey Kodanev wrote:
> When we exceed current packets limit and have more than one
> segment in the list returned by skb_gso_segment(), netem drops
> only the first one, skipping the rest, hence kmemleak reports:
> 
> unreferenced object 0xffff880b5d23b600 (size 1024):
>   comm "softirq", pid 0, jiffies 4384527763 (age 2770.629s)
>   hex dump (first 32 bytes):
>     00 80 23 5d 0b 88 ff ff 00 00 00 00 00 00 00 00  ..#]............
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<00000000d8a19b9d>] __alloc_skb+0xc9/0x520
>     [<000000001709b32f>] skb_segment+0x8c8/0x3710
>     [<00000000c7b9bb88>] tcp_gso_segment+0x331/0x1830
>     [<00000000c921cba1>] inet_gso_segment+0x476/0x1370
>     [<000000008b762dd4>] skb_mac_gso_segment+0x1f9/0x510
>     [<000000002182660a>] __skb_gso_segment+0x1dd/0x620
>     [<00000000412651b9>] netem_enqueue+0x1536/0x2590 [sch_netem]
>     [<0000000005d3b2a9>] __dev_queue_xmit+0x1167/0x2120
>     [<00000000fc5f7327>] ip_finish_output2+0x998/0xf00
>     [<00000000d309e9d3>] ip_output+0x1aa/0x2c0
>     [<000000007ecbd3a4>] tcp_transmit_skb+0x18db/0x3670
>     [<0000000042d2a45f>] tcp_write_xmit+0x4d4/0x58c0
>     [<0000000056a44199>] tcp_tasklet_func+0x3d9/0x540
>     [<0000000013d06d02>] tasklet_action+0x1ca/0x250
>     [<00000000fcde0b8b>] __do_softirq+0x1b4/0x5a3
>     [<00000000e7ed027c>] irq_exit+0x1e2/0x210
> 
> Fix it by adding the rest of the segments, if any, to skb
> 'to_free' list in that case.
> 
> Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue")
> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
> ---
>  net/sched/sch_netem.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index 7c179ad..a5023a2 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -508,8 +508,14 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>  			1<<(prandom_u32() % 8);
>  	}
>  
> -	if (unlikely(sch->q.qlen >= sch->limit))
> +	if (unlikely(sch->q.qlen >= sch->limit)) {
> +		while (segs) {
> +			skb2 = segs->next;
> +			__qdisc_drop(segs, to_free);
> +			segs = skb2;
> +		}
>  		return qdisc_drop(skb, sch, to_free);
> +	}
>  
It seems like it might be nice to wrap up this drop loop into a
qdisc_drop_all inline function.  Then we can easily drop segments in other
locations if we should need to

Regards
Neil
 
>  	qdisc_qstats_backlog_inc(sch, skb);
>  
> -- 
> 1.8.3.1
> 
>
Alexey Kodanev March 5, 2018, 12:57 p.m. UTC | #4
On 03/03/2018 03:20 PM, Neil Horman wrote:
> On Fri, Mar 02, 2018 at 09:16:48PM +0300, Alexey Kodanev wrote:
>> When we exceed current packets limit and have more than one
>> segment in the list returned by skb_gso_segment(), netem drops
>> only the first one, skipping the rest, hence kmemleak reports:
>>
...
>>
>> Fix it by adding the rest of the segments, if any, to skb
>> 'to_free' list in that case.
>>
>> Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue")
>> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
>> ---
>>  net/sched/sch_netem.c | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
>> index 7c179ad..a5023a2 100644
>> --- a/net/sched/sch_netem.c
>> +++ b/net/sched/sch_netem.c
>> @@ -508,8 +508,14 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>>  			1<<(prandom_u32() % 8);
>>  	}
>>  
>> -	if (unlikely(sch->q.qlen >= sch->limit))
>> +	if (unlikely(sch->q.qlen >= sch->limit)) {
>> +		while (segs) {
>> +			skb2 = segs->next;
>> +			__qdisc_drop(segs, to_free);
>> +			segs = skb2;
>> +		}
>>  		return qdisc_drop(skb, sch, to_free);
>> +	}
>>  
> It seems like it might be nice to wrap up this drop loop into a
> qdisc_drop_all inline function.  Then we can easily drop segments in other
> locations if we should need to


Agree, will prepare the patch. I guess we could just add 'segs' to 'to_free'
list, then add qdisc_drop_all() with stats counter and returning status,
something like this:

@@ -824,6 +824,18 @@ static inline void __qdisc_drop(struct sk_buff *skb, struct sk_buff **to_free)
        *to_free = skb;
 }

+static inline void __qdisc_drop_all(struct sk_buff *skb,
+                                   struct sk_buff **to_free)
+{
+       struct sk_buff *first = skb;
+
+       while (skb->next)
+               skb = skb->next;
+
+       skb->next = *to_free;
+       *to_free = first;
+}

Thanks,
Alexey
Eric Dumazet March 5, 2018, 3:13 p.m. UTC | #5
On Mon, 2018-03-05 at 15:57 +0300, Alexey Kodanev wrote:
> 
> +static inline void __qdisc_drop_all(struct sk_buff *skb,
> +                                   struct sk_buff **to_free)
> +{
> +       struct sk_buff *first = skb;
> +
> +       while (skb->next)
> +               skb = skb->next;
> +
> +       skb->next = *to_free;
> +       *to_free = first;
> +}

You probably can leverage what I did for commit bec3cfdca36bf43cfa
("net: skb_segment() provides list head and tail")
to avoid the iteration.
Alexey Kodanev March 5, 2018, 3:40 p.m. UTC | #6
On 03/05/2018 06:13 PM, Eric Dumazet wrote:
> On Mon, 2018-03-05 at 15:57 +0300, Alexey Kodanev wrote:
>>
>> +static inline void __qdisc_drop_all(struct sk_buff *skb,
>> +                                   struct sk_buff **to_free)
>> +{
>> +       struct sk_buff *first = skb;
>> +
>> +       while (skb->next)
>> +               skb = skb->next;
>> +
>> +       skb->next = *to_free;
>> +       *to_free = first;
>> +}
> 
> You probably can leverage what I did for commit bec3cfdca36bf43cfa
> ("net: skb_segment() provides list head and tail")
> to avoid the iteration.

OK, thanks! Will use it and send a new version of the patch.

Thanks,
Alexey
Neil Horman March 5, 2018, 11:11 p.m. UTC | #7
On Mon, Mar 05, 2018 at 03:57:52PM +0300, Alexey Kodanev wrote:
> On 03/03/2018 03:20 PM, Neil Horman wrote:
> > On Fri, Mar 02, 2018 at 09:16:48PM +0300, Alexey Kodanev wrote:
> >> When we exceed current packets limit and have more than one
> >> segment in the list returned by skb_gso_segment(), netem drops
> >> only the first one, skipping the rest, hence kmemleak reports:
> >>
> ...
> >>
> >> Fix it by adding the rest of the segments, if any, to skb
> >> 'to_free' list in that case.
> >>
> >> Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue")
> >> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
> >> ---
> >>  net/sched/sch_netem.c | 8 +++++++-
> >>  1 file changed, 7 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> >> index 7c179ad..a5023a2 100644
> >> --- a/net/sched/sch_netem.c
> >> +++ b/net/sched/sch_netem.c
> >> @@ -508,8 +508,14 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> >>  			1<<(prandom_u32() % 8);
> >>  	}
> >>  
> >> -	if (unlikely(sch->q.qlen >= sch->limit))
> >> +	if (unlikely(sch->q.qlen >= sch->limit)) {
> >> +		while (segs) {
> >> +			skb2 = segs->next;
> >> +			__qdisc_drop(segs, to_free);
> >> +			segs = skb2;
> >> +		}
> >>  		return qdisc_drop(skb, sch, to_free);
> >> +	}
> >>  
> > It seems like it might be nice to wrap up this drop loop into a
> > qdisc_drop_all inline function.  Then we can easily drop segments in other
> > locations if we should need to
> 
> 
> Agree, will prepare the patch. I guess we could just add 'segs' to 'to_free'
> list, then add qdisc_drop_all() with stats counter and returning status,
> something like this:
> 
> @@ -824,6 +824,18 @@ static inline void __qdisc_drop(struct sk_buff *skb, struct sk_buff **to_free)
>         *to_free = skb;
>  }
> 
> +static inline void __qdisc_drop_all(struct sk_buff *skb,
> +                                   struct sk_buff **to_free)
> +{
> +       struct sk_buff *first = skb;
> +
> +       while (skb->next)
> +               skb = skb->next;
> +
> +       skb->next = *to_free;
> +       *to_free = first;
> +}
> 
 I agree

Thanks!
Neil

> Thanks,
> Alexey
>
diff mbox series

Patch

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 7c179ad..a5023a2 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -508,8 +508,14 @@  static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 			1<<(prandom_u32() % 8);
 	}
 
-	if (unlikely(sch->q.qlen >= sch->limit))
+	if (unlikely(sch->q.qlen >= sch->limit)) {
+		while (segs) {
+			skb2 = segs->next;
+			__qdisc_drop(segs, to_free);
+			segs = skb2;
+		}
 		return qdisc_drop(skb, sch, to_free);
+	}
 
 	qdisc_qstats_backlog_inc(sch, skb);