diff mbox

3.6-rt: inet_sk_rx_dst_set() network splat

Message ID 1366853695.8964.120.camel@edumazet-glaptop
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet April 25, 2013, 1:34 a.m. UTC
From: Eric Dumazet <edumazet@google.com>

On Wed, 2013-04-24 at 08:50 +0200, Mike Galbraith wrote:
> Giving 3.6-rt some routine usage runtime, while updating kernel git
> repositories, the below fell out, but didn't repeat while updating other
> repositories. 
> 
> [  381.481464] ------------[ cut here ]------------
> [  381.486090] WARNING: at include/linux/skbuff.h:536 inet_sk_rx_dst_set+0x8c/0xe0()
> [  381.493566] Hardware name: MS-7502
> [  381.493612] Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables nfsd snd_pcm_oss snd_mixer_oss snd_seq nfs_acl snd_seq_device auth_rpcgss edd nfs fscache lockd sunrpc bridge ipv6 stp cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf nls_iso8859_1 nls_cp437 vfat fat fuse ext3 jbd arc4 rt2800usb rt2800lib crc_ccitt rt2x00usb rt2x00lib mac80211 iTCO_wdt iTCO_vendor_support cfg80211 hid_generic rfkill usb_storage snd_hda_codec_realtek sr_mod cdrom sg snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer e1000e snd firewire_ohci firewire_core coretemp microcode soundcore lpc_ich mfd_core crc_itu_t snd_page_alloc i2c_i801 button ext4 mbcache jbd2 crc16 usbhid hid sd_mod crc_t10dif uhci_hcd ehci_hcd rtc_cmos ahci libahci libata thermal fan scsi_mod usbcore usb_common processor
> [  381.493620] Pid: 6170, comm: git Not tainted 3.6.11.1-rt32-smp #52
> [  381.493621] Call Trace:
> [  381.493626]  [<ffffffff8103cddf>] warn_slowpath_common+0x7f/0xc0
> [  381.493629]  [<ffffffff8103ce3a>] warn_slowpath_null+0x1a/0x20
> [  381.493631]  [<ffffffff813f6f0c>] inet_sk_rx_dst_set+0x8c/0xe0
> [  381.493633]  [<ffffffff813ece77>] tcp_rcv_established+0x797/0x7d0
> [  381.493636]  [<ffffffff813f82d4>] tcp_v4_do_rcv+0x134/0x220
> [  381.493638]  [<ffffffff813debc7>] tcp_prequeue_process+0x67/0xb0
> [  381.493641]  [<ffffffff813e373a>] tcp_recvmsg+0xaca/0xd70
> [  381.493645]  [<ffffffff810a627b>] ? __lock_release+0x6b/0xe0
> [  381.493648]  [<ffffffff8140f381>] inet_recvmsg+0x121/0x240
> [  381.493651]  [<ffffffff8140ead0>] ? inet_sock_destruct+0x230/0x230
> [  381.493655]  [<ffffffff8136fd49>] sock_aio_read.part.19+0xf9/0x120
> [  381.493657]  [<ffffffff8136fee0>] ? sock_aio_write+0x90/0xb0
> [  381.493660]  [<ffffffff8136fd96>] sock_aio_read+0x26/0x30
> [  381.493662]  [<ffffffff8116c503>] do_sync_read+0xa3/0xe0
> [  381.493665]  [<ffffffff8116ce9d>] vfs_read+0x14d/0x160
> [  381.493667]  [<ffffffff8116cefd>] sys_read+0x4d/0x90
> [  381.493670]  [<ffffffff81481812>] system_call_fastpath+0x16/0x1b
> [  381.493671] ---[ end trace 0000000000000002 ]---
> 
>  529 static inline struct dst_entry *skb_dst(const struct sk_buff *skb)
>  530 {
>  531         /* If refdst was not refcounted, check we still are in a
>  532          * rcu_read_lock section
>  533          */
>  534         WARN_ON((skb->_skb_refdst & SKB_DST_NOREF) &&
>  535                 !rcu_read_lock_held() &&
>  536                 !rcu_read_lock_bh_held());
>  537         return (struct dst_entry *)(skb->_skb_refdst & SKB_DST_PTRMASK);
>  538 }
> 

Thanks for the report, here is a fix.

It will be a bit of a hassle to merge this one on net-next, as
tcp_prequeue() was moved in commit
b2fb4f54ecd47c42413d54b4666b06cf93c05abf
(tcp: uninline tcp_prequeue() )

David, maybe you prefer to pull into net tree the move, then I respin
the fix ?

[PATCH] tcp: force a dst refcount when prequeue packet

Before escaping RCU protected section and adding packet into
prequeue, make sure the dst is refcounted.

Reported-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/tcp.h |    1 +
 1 file changed, 1 insertion(+)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller April 25, 2013, 4:36 a.m. UTC | #1
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 24 Apr 2013 18:34:55 -0700

> From: Eric Dumazet <edumazet@google.com>
 ...
> Thanks for the report, here is a fix.
> 
> It will be a bit of a hassle to merge this one on net-next, as
> tcp_prequeue() was moved in commit
> b2fb4f54ecd47c42413d54b4666b06cf93c05abf
> (tcp: uninline tcp_prequeue() )
> 
> David, maybe you prefer to pull into net tree the move, then I respin
> the fix ?
> 
> [PATCH] tcp: force a dst refcount when prequeue packet

I'll apply this to 'net', since you've explained the complict I now
know how to resolve this when I next merge into 'net-next'.

Stephen, there is a bug fix going into 'net' for tcp_prequeue()
however in 'net-next' tcp_prequeue() has simply moved from one
place to another.  The merge resolution is to simple move the
new skb_dst_force() call to the new location.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Galbraith April 25, 2013, 4:51 a.m. UTC | #2
On Wed, 2013-04-24 at 18:34 -0700, Eric Dumazet wrote: 
> From: Eric Dumazet <edumazet@google.com>
> 
> On Wed, 2013-04-24 at 08:50 +0200, Mike Galbraith wrote:
> > Giving 3.6-rt some routine usage runtime, while updating kernel git
> > repositories, the below fell out, but didn't repeat while updating other
> > repositories. 
> > 
> > [  381.481464] ------------[ cut here ]------------
> > [  381.486090] WARNING: at include/linux/skbuff.h:536 inet_sk_rx_dst_set+0x8c/0xe0()
> > [  381.493566] Hardware name: MS-7502
> > [  381.493612] Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables nfsd snd_pcm_oss snd_mixer_oss snd_seq nfs_acl snd_seq_device auth_rpcgss edd nfs fscache lockd sunrpc bridge ipv6 stp cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf nls_iso8859_1 nls_cp437 vfat fat fuse ext3 jbd arc4 rt2800usb rt2800lib crc_ccitt rt2x00usb rt2x00lib mac80211 iTCO_wdt iTCO_vendor_support cfg80211 hid_generic rfkill usb_storage snd_hda_codec_realtek sr_mod cdrom sg snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer e1000e snd firewire_ohci firewire_core coretemp microcode soundcore lpc_ich mfd_core crc_itu_t snd_page_alloc i2c_i801 button ext4 mbcache jbd2 crc16 usbhid hid sd_mod crc_t10dif uhci_hcd ehci_hcd rtc_cmos ahci libahci libata thermal fan scsi_mod usbcore usb_common processor
> > [  381.493620] Pid: 6170, comm: git Not tainted 3.6.11.1-rt32-smp #52
> > [  381.493621] Call Trace:
> > [  381.493626]  [<ffffffff8103cddf>] warn_slowpath_common+0x7f/0xc0
> > [  381.493629]  [<ffffffff8103ce3a>] warn_slowpath_null+0x1a/0x20
> > [  381.493631]  [<ffffffff813f6f0c>] inet_sk_rx_dst_set+0x8c/0xe0
> > [  381.493633]  [<ffffffff813ece77>] tcp_rcv_established+0x797/0x7d0
> > [  381.493636]  [<ffffffff813f82d4>] tcp_v4_do_rcv+0x134/0x220
> > [  381.493638]  [<ffffffff813debc7>] tcp_prequeue_process+0x67/0xb0
> > [  381.493641]  [<ffffffff813e373a>] tcp_recvmsg+0xaca/0xd70
> > [  381.493645]  [<ffffffff810a627b>] ? __lock_release+0x6b/0xe0
> > [  381.493648]  [<ffffffff8140f381>] inet_recvmsg+0x121/0x240
> > [  381.493651]  [<ffffffff8140ead0>] ? inet_sock_destruct+0x230/0x230
> > [  381.493655]  [<ffffffff8136fd49>] sock_aio_read.part.19+0xf9/0x120
> > [  381.493657]  [<ffffffff8136fee0>] ? sock_aio_write+0x90/0xb0
> > [  381.493660]  [<ffffffff8136fd96>] sock_aio_read+0x26/0x30
> > [  381.493662]  [<ffffffff8116c503>] do_sync_read+0xa3/0xe0
> > [  381.493665]  [<ffffffff8116ce9d>] vfs_read+0x14d/0x160
> > [  381.493667]  [<ffffffff8116cefd>] sys_read+0x4d/0x90
> > [  381.493670]  [<ffffffff81481812>] system_call_fastpath+0x16/0x1b
> > [  381.493671] ---[ end trace 0000000000000002 ]---
> > 
> >  529 static inline struct dst_entry *skb_dst(const struct sk_buff *skb)
> >  530 {
> >  531         /* If refdst was not refcounted, check we still are in a
> >  532          * rcu_read_lock section
> >  533          */
> >  534         WARN_ON((skb->_skb_refdst & SKB_DST_NOREF) &&
> >  535                 !rcu_read_lock_held() &&
> >  536                 !rcu_read_lock_bh_held());
> >  537         return (struct dst_entry *)(skb->_skb_refdst & SKB_DST_PTRMASK);
> >  538 }
> > 
> 
> Thanks for the report, here is a fix.

Thanks for the fix.  I'll apply it and beat on the box some more (g.p.).
Gripe only happened the one time during long workout day.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 25, 2013, 5:03 a.m. UTC | #3
On Thu, 2013-04-25 at 06:51 +0200, Mike Galbraith wrote:

> Thanks for the fix.  I'll apply it and beat on the box some more (g.p.).
> Gripe only happened the one time during long workout day.

To force it to happen, you need to invalidate the cached dst in socket.

So presumably doing in the background some "ip ro ..." commands to
add/delete routes.

Then prequeue happens if you use a program blocking on receive (not
using poll()/select()/epoll())



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sebastian Andrzej Siewior April 26, 2013, 10:50 a.m. UTC | #4
* Eric Dumazet | 2013-04-24 22:03:53 [-0700]:

>On Thu, 2013-04-25 at 06:51 +0200, Mike Galbraith wrote:
>
>> Thanks for the fix.  I'll apply it and beat on the box some more (g.p.).
>> Gripe only happened the one time during long workout day.
>
>To force it to happen, you need to invalidate the cached dst in socket.
>
>So presumably doing in the background some "ip ro ..." commands to
>add/delete routes.

With this instructions I can crash in like 3-5 minutes, with your patch
it runs now for 50minutes now. Thanks.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 26, 2013, 2:14 p.m. UTC | #5
On Fri, 2013-04-26 at 12:50 +0200, Sebastian Andrzej Siewior wrote:
> * Eric Dumazet | 2013-04-24 22:03:53 [-0700]:
> 
> >On Thu, 2013-04-25 at 06:51 +0200, Mike Galbraith wrote:
> >
> >> Thanks for the fix.  I'll apply it and beat on the box some more (g.p.).
> >> Gripe only happened the one time during long workout day.
> >
> >To force it to happen, you need to invalidate the cached dst in socket.
> >
> >So presumably doing in the background some "ip ro ..." commands to
> >add/delete routes.
> 
> With this instructions I can crash in like 3-5 minutes, with your patch
> it runs now for 50minutes now. Thanks.

Thanks for testing !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt May 2, 2013, 12:49 a.m. UTC | #6
On Fri, 2013-04-26 at 12:50 +0200, Sebastian Andrzej Siewior wrote:
> * Eric Dumazet | 2013-04-24 22:03:53 [-0700]:
> 
> >On Thu, 2013-04-25 at 06:51 +0200, Mike Galbraith wrote:
> >
> >> Thanks for the fix.  I'll apply it and beat on the box some more (g.p.).
> >> Gripe only happened the one time during long workout day.
> >
> >To force it to happen, you need to invalidate the cached dst in socket.
> >
> >So presumably doing in the background some "ip ro ..." commands to
> >add/delete routes.
> 
> With this instructions I can crash in like 3-5 minutes, with your patch
> it runs now for 50minutes now. Thanks.

Is this something going into the stable tree, or does it not affect 3.0,
3.2, 3.4 or 3.6?

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sebastian Andrzej Siewior May 3, 2013, 8:20 a.m. UTC | #7
* Steven Rostedt | 2013-05-01 20:49:39 [-0400]:

>Is this something going into the stable tree, or does it not affect 3.0,
>3.2, 3.4 or 3.6?

Mike Galbraith reported it against v3.6 so I would say that this is one
candidate. I just booted v3.4.41-rt55 and it runs for 20 minutes now
without any issues. Maybe it crashes after I sent the email but I hope
it does not :)

>-- Steve

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 3, 2013, 2:03 p.m. UTC | #8
On Fri, 2013-05-03 at 10:20 +0200, Sebastian Andrzej Siewior wrote:
> * Steven Rostedt | 2013-05-01 20:49:39 [-0400]:
> 
> >Is this something going into the stable tree, or does it not affect 3.0,
> >3.2, 3.4 or 3.6?
> 
> Mike Galbraith reported it against v3.6 so I would say that this is one
> candidate. I just booted v3.4.41-rt55 and it runs for 20 minutes now
> without any issues. Maybe it crashes after I sent the email but I hope
> it does not :)

Thats because the particular way to trigger the bug was to use the IP
early demux, and it was added in 3.5



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt May 3, 2013, 3:13 p.m. UTC | #9
On Fri, 2013-05-03 at 07:03 -0700, Eric Dumazet wrote:
> On Fri, 2013-05-03 at 10:20 +0200, Sebastian Andrzej Siewior wrote:
> > * Steven Rostedt | 2013-05-01 20:49:39 [-0400]:
> > 
> > >Is this something going into the stable tree, or does it not affect 3.0,
> > >3.2, 3.4 or 3.6?
> > 
> > Mike Galbraith reported it against v3.6 so I would say that this is one
> > candidate. I just booted v3.4.41-rt55 and it runs for 20 minutes now
> > without any issues. Maybe it crashes after I sent the email but I hope
> > it does not :)
> 
> Thats because the particular way to trigger the bug was to use the IP
> early demux, and it was added in 3.5

Is it OK to keep it? I pulled the patch into 3.2-rt and 3.4-rt. I can
remove it if you think it's not necessary and will cause a slight
performance penalty.

Thanks,

-- Steve



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt May 3, 2013, 9:56 p.m. UTC | #10
On Fri, 2013-05-03 at 11:13 -0400, Steven Rostedt wrote:
> On Fri, 2013-05-03 at 07:03 -0700, Eric Dumazet wrote:
> > On Fri, 2013-05-03 at 10:20 +0200, Sebastian Andrzej Siewior wrote:
> > > * Steven Rostedt | 2013-05-01 20:49:39 [-0400]:
> > > 
> > > >Is this something going into the stable tree, or does it not affect 3.0,
> > > >3.2, 3.4 or 3.6?
> > > 
> > > Mike Galbraith reported it against v3.6 so I would say that this is one
> > > candidate. I just booted v3.4.41-rt55 and it runs for 20 minutes now
> > > without any issues. Maybe it crashes after I sent the email but I hope
> > > it does not :)
> > 
> > Thats because the particular way to trigger the bug was to use the IP
> > early demux, and it was added in 3.5
> 
> Is it OK to keep it? I pulled the patch into 3.2-rt and 3.4-rt. I can
> remove it if you think it's not necessary and will cause a slight
> performance penalty.

I took it out of 3.2-rt as I needed to add a patch and rerun my tests.
But I'll leave it in 3.4-rt (unless you think I shoudn't) just so I
don't need to rerun the tests. I'm only posting a pre-release anyway.

Thanks,

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 3, 2013, 9:58 p.m. UTC | #11
On Fri, 2013-05-03 at 11:13 -0400, Steven Rostedt wrote:
> On Fri, 2013-05-03 at 07:03 -0700, Eric Dumazet wrote:
> > On Fri, 2013-05-03 at 10:20 +0200, Sebastian Andrzej Siewior wrote:
> > > * Steven Rostedt | 2013-05-01 20:49:39 [-0400]:
> > > 
> > > >Is this something going into the stable tree, or does it not affect 3.0,
> > > >3.2, 3.4 or 3.6?
> > > 
> > > Mike Galbraith reported it against v3.6 so I would say that this is one
> > > candidate. I just booted v3.4.41-rt55 and it runs for 20 minutes now
> > > without any issues. Maybe it crashes after I sent the email but I hope
> > > it does not :)
> > 
> > Thats because the particular way to trigger the bug was to use the IP
> > early demux, and it was added in 3.5
> 
> Is it OK to keep it? I pulled the patch into 3.2-rt and 3.4-rt. I can
> remove it if you think it's not necessary and will cause a slight
> performance penalty.

You can keep it.

I was refereeing to the method used to trigger the bug.

But the bug is really old and could trigger with other workloads.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/tcp.h b/include/net/tcp.h
index cf0694d..a345480 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1049,6 +1049,7 @@  static inline bool tcp_prequeue(struct sock *sk, struct sk_buff *skb)
 	    skb_queue_len(&tp->ucopy.prequeue) == 0)
 		return false;
 
+	skb_dst_force(skb);
 	__skb_queue_tail(&tp->ucopy.prequeue, skb);
 	tp->ucopy.memory += skb->truesize;
 	if (tp->ucopy.memory > sk->sk_rcvbuf) {