Message ID | 1520014608-512-1-git-send-email-alexey.kodanev@oracle.com |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
Series | [net] sch_netem: fix skb leak in netem_enqueue() | expand |
On Fri, 2 Mar 2018 21:16:48 +0300 Alexey Kodanev <alexey.kodanev@oracle.com> wrote: > When we exceed current packets limit and have more than one > segment in the list returned by skb_gso_segment(), netem drops > only the first one, skipping the rest, hence kmemleak reports: > > unreferenced object 0xffff880b5d23b600 (size 1024): > comm "softirq", pid 0, jiffies 4384527763 (age 2770.629s) > hex dump (first 32 bytes): > 00 80 23 5d 0b 88 ff ff 00 00 00 00 00 00 00 00 ..#]............ > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > backtrace: > [<00000000d8a19b9d>] __alloc_skb+0xc9/0x520 > [<000000001709b32f>] skb_segment+0x8c8/0x3710 > [<00000000c7b9bb88>] tcp_gso_segment+0x331/0x1830 > [<00000000c921cba1>] inet_gso_segment+0x476/0x1370 > [<000000008b762dd4>] skb_mac_gso_segment+0x1f9/0x510 > [<000000002182660a>] __skb_gso_segment+0x1dd/0x620 > [<00000000412651b9>] netem_enqueue+0x1536/0x2590 [sch_netem] > [<0000000005d3b2a9>] __dev_queue_xmit+0x1167/0x2120 > [<00000000fc5f7327>] ip_finish_output2+0x998/0xf00 > [<00000000d309e9d3>] ip_output+0x1aa/0x2c0 > [<000000007ecbd3a4>] tcp_transmit_skb+0x18db/0x3670 > [<0000000042d2a45f>] tcp_write_xmit+0x4d4/0x58c0 > [<0000000056a44199>] tcp_tasklet_func+0x3d9/0x540 > [<0000000013d06d02>] tasklet_action+0x1ca/0x250 > [<00000000fcde0b8b>] __do_softirq+0x1b4/0x5a3 > [<00000000e7ed027c>] irq_exit+0x1e2/0x210 > > Fix it by adding the rest of the segments, if any, to skb > 'to_free' list in that case. > > Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue") > Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> > --- > net/sched/sch_netem.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c > index 7c179ad..a5023a2 100644 > --- a/net/sched/sch_netem.c > +++ b/net/sched/sch_netem.c > @@ -508,8 +508,14 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, > 1<<(prandom_u32() % 8); > } > > - if (unlikely(sch->q.qlen >= sch->limit)) > + if (unlikely(sch->q.qlen >= sch->limit)) { > + while (segs) { > + skb2 = segs->next; > + __qdisc_drop(segs, to_free); > + segs = skb2; > + } > return qdisc_drop(skb, sch, to_free); > + } > > qdisc_qstats_backlog_inc(sch, skb); > Since this is a generic problem why is not fixed in qdisc_drop instead?
On Fri, 2018-03-02 at 10:44 -0800, Stephen Hemminger wrote: > On Fri, 2 Mar 2018 21:16:48 +0300 > > Since this is a generic problem why is not fixed in qdisc_drop instead? AFAIK only netem and tbf might segment GSO packets so far. I am not sure we want to add code in qdisc_drop() that is used under stress on normal qdiscs and are inline code.
On Fri, Mar 02, 2018 at 09:16:48PM +0300, Alexey Kodanev wrote: > When we exceed current packets limit and have more than one > segment in the list returned by skb_gso_segment(), netem drops > only the first one, skipping the rest, hence kmemleak reports: > > unreferenced object 0xffff880b5d23b600 (size 1024): > comm "softirq", pid 0, jiffies 4384527763 (age 2770.629s) > hex dump (first 32 bytes): > 00 80 23 5d 0b 88 ff ff 00 00 00 00 00 00 00 00 ..#]............ > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > backtrace: > [<00000000d8a19b9d>] __alloc_skb+0xc9/0x520 > [<000000001709b32f>] skb_segment+0x8c8/0x3710 > [<00000000c7b9bb88>] tcp_gso_segment+0x331/0x1830 > [<00000000c921cba1>] inet_gso_segment+0x476/0x1370 > [<000000008b762dd4>] skb_mac_gso_segment+0x1f9/0x510 > [<000000002182660a>] __skb_gso_segment+0x1dd/0x620 > [<00000000412651b9>] netem_enqueue+0x1536/0x2590 [sch_netem] > [<0000000005d3b2a9>] __dev_queue_xmit+0x1167/0x2120 > [<00000000fc5f7327>] ip_finish_output2+0x998/0xf00 > [<00000000d309e9d3>] ip_output+0x1aa/0x2c0 > [<000000007ecbd3a4>] tcp_transmit_skb+0x18db/0x3670 > [<0000000042d2a45f>] tcp_write_xmit+0x4d4/0x58c0 > [<0000000056a44199>] tcp_tasklet_func+0x3d9/0x540 > [<0000000013d06d02>] tasklet_action+0x1ca/0x250 > [<00000000fcde0b8b>] __do_softirq+0x1b4/0x5a3 > [<00000000e7ed027c>] irq_exit+0x1e2/0x210 > > Fix it by adding the rest of the segments, if any, to skb > 'to_free' list in that case. > > Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue") > Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> > --- > net/sched/sch_netem.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c > index 7c179ad..a5023a2 100644 > --- a/net/sched/sch_netem.c > +++ b/net/sched/sch_netem.c > @@ -508,8 +508,14 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, > 1<<(prandom_u32() % 8); > } > > - if (unlikely(sch->q.qlen >= sch->limit)) > + if (unlikely(sch->q.qlen >= sch->limit)) { > + while (segs) { > + skb2 = segs->next; > + __qdisc_drop(segs, to_free); > + segs = skb2; > + } > return qdisc_drop(skb, sch, to_free); > + } > It seems like it might be nice to wrap up this drop loop into a qdisc_drop_all inline function. Then we can easily drop segments in other locations if we should need to Regards Neil > qdisc_qstats_backlog_inc(sch, skb); > > -- > 1.8.3.1 > >
On 03/03/2018 03:20 PM, Neil Horman wrote: > On Fri, Mar 02, 2018 at 09:16:48PM +0300, Alexey Kodanev wrote: >> When we exceed current packets limit and have more than one >> segment in the list returned by skb_gso_segment(), netem drops >> only the first one, skipping the rest, hence kmemleak reports: >> ... >> >> Fix it by adding the rest of the segments, if any, to skb >> 'to_free' list in that case. >> >> Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue") >> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> >> --- >> net/sched/sch_netem.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c >> index 7c179ad..a5023a2 100644 >> --- a/net/sched/sch_netem.c >> +++ b/net/sched/sch_netem.c >> @@ -508,8 +508,14 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, >> 1<<(prandom_u32() % 8); >> } >> >> - if (unlikely(sch->q.qlen >= sch->limit)) >> + if (unlikely(sch->q.qlen >= sch->limit)) { >> + while (segs) { >> + skb2 = segs->next; >> + __qdisc_drop(segs, to_free); >> + segs = skb2; >> + } >> return qdisc_drop(skb, sch, to_free); >> + } >> > It seems like it might be nice to wrap up this drop loop into a > qdisc_drop_all inline function. Then we can easily drop segments in other > locations if we should need to Agree, will prepare the patch. I guess we could just add 'segs' to 'to_free' list, then add qdisc_drop_all() with stats counter and returning status, something like this: @@ -824,6 +824,18 @@ static inline void __qdisc_drop(struct sk_buff *skb, struct sk_buff **to_free) *to_free = skb; } +static inline void __qdisc_drop_all(struct sk_buff *skb, + struct sk_buff **to_free) +{ + struct sk_buff *first = skb; + + while (skb->next) + skb = skb->next; + + skb->next = *to_free; + *to_free = first; +} Thanks, Alexey
On Mon, 2018-03-05 at 15:57 +0300, Alexey Kodanev wrote: > > +static inline void __qdisc_drop_all(struct sk_buff *skb, > + struct sk_buff **to_free) > +{ > + struct sk_buff *first = skb; > + > + while (skb->next) > + skb = skb->next; > + > + skb->next = *to_free; > + *to_free = first; > +} You probably can leverage what I did for commit bec3cfdca36bf43cfa ("net: skb_segment() provides list head and tail") to avoid the iteration.
On 03/05/2018 06:13 PM, Eric Dumazet wrote: > On Mon, 2018-03-05 at 15:57 +0300, Alexey Kodanev wrote: >> >> +static inline void __qdisc_drop_all(struct sk_buff *skb, >> + struct sk_buff **to_free) >> +{ >> + struct sk_buff *first = skb; >> + >> + while (skb->next) >> + skb = skb->next; >> + >> + skb->next = *to_free; >> + *to_free = first; >> +} > > You probably can leverage what I did for commit bec3cfdca36bf43cfa > ("net: skb_segment() provides list head and tail") > to avoid the iteration. OK, thanks! Will use it and send a new version of the patch. Thanks, Alexey
On Mon, Mar 05, 2018 at 03:57:52PM +0300, Alexey Kodanev wrote: > On 03/03/2018 03:20 PM, Neil Horman wrote: > > On Fri, Mar 02, 2018 at 09:16:48PM +0300, Alexey Kodanev wrote: > >> When we exceed current packets limit and have more than one > >> segment in the list returned by skb_gso_segment(), netem drops > >> only the first one, skipping the rest, hence kmemleak reports: > >> > ... > >> > >> Fix it by adding the rest of the segments, if any, to skb > >> 'to_free' list in that case. > >> > >> Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue") > >> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> > >> --- > >> net/sched/sch_netem.c | 8 +++++++- > >> 1 file changed, 7 insertions(+), 1 deletion(-) > >> > >> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c > >> index 7c179ad..a5023a2 100644 > >> --- a/net/sched/sch_netem.c > >> +++ b/net/sched/sch_netem.c > >> @@ -508,8 +508,14 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, > >> 1<<(prandom_u32() % 8); > >> } > >> > >> - if (unlikely(sch->q.qlen >= sch->limit)) > >> + if (unlikely(sch->q.qlen >= sch->limit)) { > >> + while (segs) { > >> + skb2 = segs->next; > >> + __qdisc_drop(segs, to_free); > >> + segs = skb2; > >> + } > >> return qdisc_drop(skb, sch, to_free); > >> + } > >> > > It seems like it might be nice to wrap up this drop loop into a > > qdisc_drop_all inline function. Then we can easily drop segments in other > > locations if we should need to > > > Agree, will prepare the patch. I guess we could just add 'segs' to 'to_free' > list, then add qdisc_drop_all() with stats counter and returning status, > something like this: > > @@ -824,6 +824,18 @@ static inline void __qdisc_drop(struct sk_buff *skb, struct sk_buff **to_free) > *to_free = skb; > } > > +static inline void __qdisc_drop_all(struct sk_buff *skb, > + struct sk_buff **to_free) > +{ > + struct sk_buff *first = skb; > + > + while (skb->next) > + skb = skb->next; > + > + skb->next = *to_free; > + *to_free = first; > +} > I agree Thanks! Neil > Thanks, > Alexey >
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index 7c179ad..a5023a2 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -508,8 +508,14 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, 1<<(prandom_u32() % 8); } - if (unlikely(sch->q.qlen >= sch->limit)) + if (unlikely(sch->q.qlen >= sch->limit)) { + while (segs) { + skb2 = segs->next; + __qdisc_drop(segs, to_free); + segs = skb2; + } return qdisc_drop(skb, sch, to_free); + } qdisc_qstats_backlog_inc(sch, skb);
When we exceed current packets limit and have more than one segment in the list returned by skb_gso_segment(), netem drops only the first one, skipping the rest, hence kmemleak reports: unreferenced object 0xffff880b5d23b600 (size 1024): comm "softirq", pid 0, jiffies 4384527763 (age 2770.629s) hex dump (first 32 bytes): 00 80 23 5d 0b 88 ff ff 00 00 00 00 00 00 00 00 ..#]............ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000d8a19b9d>] __alloc_skb+0xc9/0x520 [<000000001709b32f>] skb_segment+0x8c8/0x3710 [<00000000c7b9bb88>] tcp_gso_segment+0x331/0x1830 [<00000000c921cba1>] inet_gso_segment+0x476/0x1370 [<000000008b762dd4>] skb_mac_gso_segment+0x1f9/0x510 [<000000002182660a>] __skb_gso_segment+0x1dd/0x620 [<00000000412651b9>] netem_enqueue+0x1536/0x2590 [sch_netem] [<0000000005d3b2a9>] __dev_queue_xmit+0x1167/0x2120 [<00000000fc5f7327>] ip_finish_output2+0x998/0xf00 [<00000000d309e9d3>] ip_output+0x1aa/0x2c0 [<000000007ecbd3a4>] tcp_transmit_skb+0x18db/0x3670 [<0000000042d2a45f>] tcp_write_xmit+0x4d4/0x58c0 [<0000000056a44199>] tcp_tasklet_func+0x3d9/0x540 [<0000000013d06d02>] tasklet_action+0x1ca/0x250 [<00000000fcde0b8b>] __do_softirq+0x1b4/0x5a3 [<00000000e7ed027c>] irq_exit+0x1e2/0x210 Fix it by adding the rest of the segments, if any, to skb 'to_free' list in that case. Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> --- net/sched/sch_netem.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)