Patchwork ftrace: tracepoint of net_dev_xmit sees freed skb and causes panic

login
register
mail settings
Submitter Koki Sanagi
Date May 31, 2011, 7:48 a.m.
Message ID <4DE49D52.709@jp.fujitsu.com>
Download mbox | patch
Permalink /patch/97967/
State Accepted
Delegated to: David Miller
Headers show

Comments

Koki Sanagi - May 31, 2011, 7:48 a.m.
Because there is a possibility that skb is kfree_skb()ed and zero cleared
after ndo_start_xmit, we should not see the contents of skb like skb->len and
skb->dev->name after ndo_start_xmit. But trace_net_dev_xmit does that
and causes panic by NULL pointer dereference.
This patch fixes trace_net_dev_xmit not to see the contents of skb directly.

If you want to reproduce this panic,

1. Get tracepoint of net_dev_xmit on
2. Create 2 guests on KVM
2. Make 2 guests use virtio_net
4. Execute netperf from one to another for a long time as a network burden
5. host will panic(It takes about 30 minutes)

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
 include/trace/events/net.h |   12 +++++++-----
 net/core/dev.c             |    7 +++++--
 2 files changed, 12 insertions(+), 7 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt - May 31, 2011, 3:13 p.m.
On Tue, 2011-05-31 at 16:48 +0900, Koki Sanagi wrote:
> Because there is a possibility that skb is kfree_skb()ed and zero cleared
> after ndo_start_xmit, we should not see the contents of skb like skb->len and
> skb->dev->name after ndo_start_xmit. But trace_net_dev_xmit does that
> and causes panic by NULL pointer dereference.
> This patch fixes trace_net_dev_xmit not to see the contents of skb directly.


>  
>  	if (likely(!skb->next)) {
>  		u32 features;
> @@ -2139,8 +2140,9 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>  			}
>  		}
>  
> +		skb_len = skb->len;
>  		rc = ops->ndo_start_xmit(skb, dev);
> -		trace_net_dev_xmit(skb, rc);
> +		trace_net_dev_xmit(skb, rc, dev, skb_len);
>  		if (rc == NETDEV_TX_OK)
>  			txq_trans_update(txq);
>  		return rc;
> @@ -2160,8 +2162,9 @@ gso:
>  		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
>  			skb_dst_drop(nskb);
>  
> +		skb_len = nskb->len;
>  		rc = ops->ndo_start_xmit(nskb, dev);
> -		trace_net_dev_xmit(nskb, rc);
> +		trace_net_dev_xmit(nskb, rc, dev, skb_len);

What if you just put the tracepoint before the call to
ops->ndo_start_xmit?

-- Steve

>  		if (unlikely(rc != NETDEV_TX_OK)) {
>  			if (rc & ~NETDEV_TX_MASK)
>  				goto out_kfree_gso_skb;


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt - May 31, 2011, 3:14 p.m.
Note, the subject should not be "ftrace:", but "net:" or maybe even
"net/tracing", as this really has nothing to do with ftrace code. The
tracepoints are more generic than ftrace.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman - May 31, 2011, 4:11 p.m.
On Tue, May 31, 2011 at 11:13:11AM -0400, Steven Rostedt wrote:
> On Tue, 2011-05-31 at 16:48 +0900, Koki Sanagi wrote:
> > Because there is a possibility that skb is kfree_skb()ed and zero cleared
> > after ndo_start_xmit, we should not see the contents of skb like skb->len and
> > skb->dev->name after ndo_start_xmit. But trace_net_dev_xmit does that
> > and causes panic by NULL pointer dereference.
> > This patch fixes trace_net_dev_xmit not to see the contents of skb directly.
> 
> 
> >  
> >  	if (likely(!skb->next)) {
> >  		u32 features;
> > @@ -2139,8 +2140,9 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
> >  			}
> >  		}
> >  
> > +		skb_len = skb->len;
> >  		rc = ops->ndo_start_xmit(skb, dev);
> > -		trace_net_dev_xmit(skb, rc);
> > +		trace_net_dev_xmit(skb, rc, dev, skb_len);
> >  		if (rc == NETDEV_TX_OK)
> >  			txq_trans_update(txq);
> >  		return rc;
> > @@ -2160,8 +2162,9 @@ gso:
> >  		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
> >  			skb_dst_drop(nskb);
> >  
> > +		skb_len = nskb->len;
> >  		rc = ops->ndo_start_xmit(nskb, dev);
> > -		trace_net_dev_xmit(nskb, rc);
> > +		trace_net_dev_xmit(nskb, rc, dev, skb_len);
> 
> What if you just put the tracepoint before the call to
> ops->ndo_start_xmit?
> 
Then you won't know the return code of ndo_start_xmit, which this tracepoint
records.
Neil

> -- Steve
> 
> >  		if (unlikely(rc != NETDEV_TX_OK)) {
> >  			if (rc & ~NETDEV_TX_MASK)
> >  				goto out_kfree_gso_skb;
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt - May 31, 2011, 4:21 p.m.
On Tue, 2011-05-31 at 12:11 -0400, Neil Horman wrote:
>  			skb_dst_drop(nskb);
> > >  
> > > +		skb_len = nskb->len;
> > >  		rc = ops->ndo_start_xmit(nskb, dev);
> > > -		trace_net_dev_xmit(nskb, rc);
> > > +		trace_net_dev_xmit(nskb, rc, dev, skb_len);
> > 
> > What if you just put the tracepoint before the call to
> > ops->ndo_start_xmit?
> > 
> Then you won't know the return code of ndo_start_xmit, which this tracepoint
> records.

Doh! Yeah, I see that now ;)

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - June 2, 2011, 8:59 p.m.
From: Steven Rostedt <rostedt@goodmis.org>
Date: Tue, 31 May 2011 11:14:27 -0400

> Note, the subject should not be "ftrace:", but "net:" or maybe even
> "net/tracing", as this really has nothing to do with ftrace code. The
> tracepoints are more generic than ftrace.

Do you guys mind if I take this in via the net-2.6 tree?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt - June 2, 2011, 9:01 p.m.
On Thu, 2011-06-02 at 13:59 -0700, David Miller wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
> Date: Tue, 31 May 2011 11:14:27 -0400
> 
> > Note, the subject should not be "ftrace:", but "net:" or maybe even
> > "net/tracing", as this really has nothing to do with ftrace code. The
> > tracepoints are more generic than ftrace.
> 
> Do you guys mind if I take this in via the net-2.6 tree?

I have no issue with that. Heck, I prefer it.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - June 2, 2011, 9:04 p.m.
From: Steven Rostedt <rostedt@goodmis.org>
Date: Thu, 02 Jun 2011 17:01:52 -0400

> On Thu, 2011-06-02 at 13:59 -0700, David Miller wrote:
>> From: Steven Rostedt <rostedt@goodmis.org>
>> Date: Tue, 31 May 2011 11:14:27 -0400
>> 
>> > Note, the subject should not be "ftrace:", but "net:" or maybe even
>> > "net/tracing", as this really has nothing to do with ftrace code. The
>> > tracepoints are more generic than ftrace.
>> 
>> Do you guys mind if I take this in via the net-2.6 tree?
> 
> I have no issue with that. Heck, I prefer it.

Great, thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Koki Sanagi - June 3, 2011, 5:03 a.m.
(2011/06/03 6:04), David Miller wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
> Date: Thu, 02 Jun 2011 17:01:52 -0400
> 
>> On Thu, 2011-06-02 at 13:59 -0700, David Miller wrote:
>>> From: Steven Rostedt <rostedt@goodmis.org>
>>> Date: Tue, 31 May 2011 11:14:27 -0400
>>>
>>>> Note, the subject should not be "ftrace:", but "net:" or maybe even
>>>> "net/tracing", as this really has nothing to do with ftrace code. The
>>>> tracepoints are more generic than ftrace.
>>>
>>> Do you guys mind if I take this in via the net-2.6 tree?
>>
>> I have no issue with that. Heck, I prefer it.
> 
> Great, thanks!

O.K.
I will resubmit this patch with appropriate subject "net/tracing:"

Thanks,
Koki Sanagi

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - June 3, 2011, 5:23 a.m.
From: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Date: Fri, 03 Jun 2011 14:03:51 +0900

> I will resubmit this patch with appropriate subject "net/tracing:"

You don't need to, I fixed it up when I applied your patch.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Koki Sanagi - June 3, 2011, 5:57 a.m.
(2011/06/03 14:23), David Miller wrote:
> From: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> Date: Fri, 03 Jun 2011 14:03:51 +0900
> 
>> I will resubmit this patch with appropriate subject "net/tracing:"
> 
> You don't need to, I fixed it up when I applied your patch.
> 

O.K.

Thanks!
Koki Sanagi
 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/include/trace/events/net.h b/include/trace/events/net.h
index 5f247f5..f99645d 100644
--- a/include/trace/events/net.h
+++ b/include/trace/events/net.h
@@ -12,22 +12,24 @@ 
 TRACE_EVENT(net_dev_xmit,
 
 	TP_PROTO(struct sk_buff *skb,
-		 int rc),
+		 int rc,
+		 struct net_device *dev,
+		 unsigned int skb_len),
 
-	TP_ARGS(skb, rc),
+	TP_ARGS(skb, rc, dev, skb_len),
 
 	TP_STRUCT__entry(
 		__field(	void *,		skbaddr		)
 		__field(	unsigned int,	len		)
 		__field(	int,		rc		)
-		__string(	name,		skb->dev->name	)
+		__string(	name,		dev->name	)
 	),
 
 	TP_fast_assign(
 		__entry->skbaddr = skb;
-		__entry->len = skb->len;
+		__entry->len = skb_len;
 		__entry->rc = rc;
-		__assign_str(name, skb->dev->name);
+		__assign_str(name, dev->name);
 	),
 
 	TP_printk("dev=%s skbaddr=%p len=%u rc=%d",
diff --git a/net/core/dev.c b/net/core/dev.c
index d945379..f0e15df 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2089,6 +2089,7 @@  int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
 	int rc = NETDEV_TX_OK;
+	unsigned int skb_len;
 
 	if (likely(!skb->next)) {
 		u32 features;
@@ -2139,8 +2140,9 @@  int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 			}
 		}
 
+		skb_len = skb->len;
 		rc = ops->ndo_start_xmit(skb, dev);
-		trace_net_dev_xmit(skb, rc);
+		trace_net_dev_xmit(skb, rc, dev, skb_len);
 		if (rc == NETDEV_TX_OK)
 			txq_trans_update(txq);
 		return rc;
@@ -2160,8 +2162,9 @@  gso:
 		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
 			skb_dst_drop(nskb);
 
+		skb_len = nskb->len;
 		rc = ops->ndo_start_xmit(nskb, dev);
-		trace_net_dev_xmit(nskb, rc);
+		trace_net_dev_xmit(nskb, rc, dev, skb_len);
 		if (unlikely(rc != NETDEV_TX_OK)) {
 			if (rc & ~NETDEV_TX_MASK)
 				goto out_kfree_gso_skb;