Message ID | 20190612185121.4175-1-jakub.kicinski@netronome.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | [net] net: netem: fix use after free and double free with packet corruption | expand |
On Wed, Jun 12, 2019 at 11:52 AM Jakub Kicinski <jakub.kicinski@netronome.com> wrote: > > Brendan reports that the use of netem's packet corruption capability > leads to strange crashes. This seems to be caused by > commit d66280b12bd7 ("net: netem: use a list in addition to rbtree") > which uses skb->next pointer to construct a fast-path queue of > in-order skbs. > > Packet corruption code has to invoke skb_gso_segment() in case > of skbs in need of GSO. skb_gso_segment() returns a list of > skbs. If next pointers of the skbs on that list do not get cleared > fast path list goes into the weeds and tries to access the next > segment skb multiple times. Mind to be more specific? How could it be accessed multiple times? > > Reported-by: Brendan Galloway <brendan.galloway@netronome.com> > Fixes: d66280b12bd7 ("net: netem: use a list in addition to rbtree") > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> > Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> > --- > net/sched/sch_netem.c | 11 ++++------- > 1 file changed, 4 insertions(+), 7 deletions(-) > > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c > index 956ff3da81f4..1fd4405611e5 100644 > --- a/net/sched/sch_netem.c > +++ b/net/sched/sch_netem.c > @@ -494,16 +494,13 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, > */ > if (q->corrupt && q->corrupt >= get_crandom(&q->corrupt_cor)) { > if (skb_is_gso(skb)) { > - segs = netem_segment(skb, sch, to_free); > - if (!segs) > + skb = netem_segment(skb, sch, to_free); > + if (!skb) > return rc_drop; > - } else { > - segs = skb; > + segs = skb->next; > + skb_mark_not_on_list(skb); > } > > - skb = segs; > - segs = segs->next; > - I don't see how this works when we hit goto finish_segs? Either goto finish_segs can be removed or needs to be fixed? Thanks.
On Fri, 14 Jun 2019 09:40:18 -0700, Cong Wang wrote: > On Wed, Jun 12, 2019 at 11:52 AM Jakub Kicinski wrote: > > > > Brendan reports that the use of netem's packet corruption capability > > leads to strange crashes. This seems to be caused by > > commit d66280b12bd7 ("net: netem: use a list in addition to rbtree") > > which uses skb->next pointer to construct a fast-path queue of > > in-order skbs. > > > > Packet corruption code has to invoke skb_gso_segment() in case > > of skbs in need of GSO. skb_gso_segment() returns a list of > > skbs. If next pointers of the skbs on that list do not get cleared > > fast path list goes into the weeds and tries to access the next > > segment skb multiple times. > > Mind to be more specific? How could it be accessed multiple times? You're right, the commit message is not great :S So we segment an skb and get a list: A -> B -> C And then we hook in A to the t_head t_tail list: h t |/ A -> B -> C Now if B and C get also get enqueued successfully all is fine, because we will overwrite the list in order. IOW: h t | | A -> B -> C h t | | A -> B -> C But if B and C get reordered we may end up with h t RB |/ | A -> B -> C B \ C Or if they get dropped (overlimits) just: h t |/ A -> B -> C while A and B are already freed. > > Reported-by: Brendan Galloway <brendan.galloway@netronome.com> > > Fixes: d66280b12bd7 ("net: netem: use a list in addition to rbtree") > > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> > > Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> > > --- > > net/sched/sch_netem.c | 11 ++++------- > > 1 file changed, 4 insertions(+), 7 deletions(-) > > > > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c > > index 956ff3da81f4..1fd4405611e5 100644 > > --- a/net/sched/sch_netem.c > > +++ b/net/sched/sch_netem.c > > @@ -494,16 +494,13 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, > > */ > > if (q->corrupt && q->corrupt >= get_crandom(&q->corrupt_cor)) { > > if (skb_is_gso(skb)) { > > - segs = netem_segment(skb, sch, to_free); > > - if (!segs) > > + skb = netem_segment(skb, sch, to_free); > > + if (!skb) > > return rc_drop; > > - } else { > > - segs = skb; > > + segs = skb->next; > > + skb_mark_not_on_list(skb); > > } > > > > - skb = segs; > > - segs = segs->next; > > - > > I don't see how this works when we hit goto finish_segs? > Either goto finish_segs can be removed or needs to be fixed? Note that I'm removing the else branch. So for non-GSO we end up with: skb = original segs = NULL for GSO we end up with: skb = first seg (->next == NULL) segs = second seg (->next == third, etc.) So should work all as is?
From: Jakub Kicinski <jakub.kicinski@netronome.com> Date: Wed, 12 Jun 2019 11:51:21 -0700 > Brendan reports that the use of netem's packet corruption capability > leads to strange crashes. This seems to be caused by > commit d66280b12bd7 ("net: netem: use a list in addition to rbtree") > which uses skb->next pointer to construct a fast-path queue of > in-order skbs. > > Packet corruption code has to invoke skb_gso_segment() in case > of skbs in need of GSO. skb_gso_segment() returns a list of > skbs. If next pointers of the skbs on that list do not get cleared > fast path list goes into the weeds and tries to access the next > segment skb multiple times. > > Reported-by: Brendan Galloway <brendan.galloway@netronome.com> > Fixes: d66280b12bd7 ("net: netem: use a list in addition to rbtree") > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> > Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Please rework the commit message a bit to make things cleared, your ascii diagrams would be great. :)
On Fri, 14 Jun 2019 19:08:08 -0700 (PDT), David Miller wrote: > From: Jakub Kicinski <jakub.kicinski@netronome.com> > Date: Wed, 12 Jun 2019 11:51:21 -0700 > > > Brendan reports that the use of netem's packet corruption capability > > leads to strange crashes. This seems to be caused by > > commit d66280b12bd7 ("net: netem: use a list in addition to rbtree") > > which uses skb->next pointer to construct a fast-path queue of > > in-order skbs. > > > > Packet corruption code has to invoke skb_gso_segment() in case > > of skbs in need of GSO. skb_gso_segment() returns a list of > > skbs. If next pointers of the skbs on that list do not get cleared > > fast path list goes into the weeds and tries to access the next > > segment skb multiple times. > > > > Reported-by: Brendan Galloway <brendan.galloway@netronome.com> > > Fixes: d66280b12bd7 ("net: netem: use a list in addition to rbtree") > > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> > > Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> > > Please rework the commit message a bit to make things cleared, your > ascii diagrams would be great. :) In process of rewriting the commit message I found a memory leak, and the backlog accounting is also buggy in the segmentation path qdisc netem 8001: root refcnt 64 limit 100 delay 19us corrupt 1% Sent 30237896 bytes 19895 pkt (dropped 1885, overlimits 0 requeues 287) backlog 0b 99p requeues 287 ^^^^^^ 99 packets but 0 bytes I need an internal review, and will repost soon. I need to stop looking for bugs here 🙈
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index 956ff3da81f4..1fd4405611e5 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -494,16 +494,13 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, */ if (q->corrupt && q->corrupt >= get_crandom(&q->corrupt_cor)) { if (skb_is_gso(skb)) { - segs = netem_segment(skb, sch, to_free); - if (!segs) + skb = netem_segment(skb, sch, to_free); + if (!skb) return rc_drop; - } else { - segs = skb; + segs = skb->next; + skb_mark_not_on_list(skb); } - skb = segs; - segs = segs->next; - skb = skb_unshare(skb, GFP_ATOMIC); if (unlikely(!skb)) { qdisc_qstats_drop(sch);