diff mbox

bonding: clear header_ops when last slave detached (v2)

Message ID 1416374292-10993-1-git-send-email-wen.gang.wang@oracle.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Wengang Wang Nov. 19, 2014, 5:18 a.m. UTC
When last slave of a bonding master is removed, the bonding then does not work.
When packet_snd is called against with a master net_device, it accesses
header_ops. In case the header_ops is not valid any longer(say ipoib module
unloaded), it will then access an invalid memory address.
This patch try to fix this issue by clearing header_ops when last slave
detached.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
---
 drivers/net/bonding/bond_main.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Wengang Wang Nov. 19, 2014, 5:22 a.m. UTC | #1
Hi Jay and Eric,

Could you please review the 2nd prompt?

thanks,
wengang

于 2014年11月19日 13:18, Wengang Wang 写道:
> When last slave of a bonding master is removed, the bonding then does not work.
> When packet_snd is called against with a master net_device, it accesses
> header_ops. In case the header_ops is not valid any longer(say ipoib module
> unloaded), it will then access an invalid memory address.
> This patch try to fix this issue by clearing header_ops when last slave
> detached.
>
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> ---
>   drivers/net/bonding/bond_main.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index c9ac06c..52a7e4b 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1730,6 +1730,7 @@ static int __bond_release_one(struct net_device *bond_dev,
>   	bond->slave_cnt--;
>   
>   	if (!bond_has_slaves(bond)) {
> +		bond->dev->header_ops = NULL;
>   		call_netdevice_notifiers(NETDEV_CHANGEADDR, bond->dev);
>   		call_netdevice_notifiers(NETDEV_RELEASE, bond->dev);
>   	}

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Nov. 19, 2014, 5:39 a.m. UTC | #2
On Wed, 2014-11-19 at 13:18 +0800, Wengang Wang wrote:
> When last slave of a bonding master is removed, the bonding then does not work.
> When packet_snd is called against with a master net_device, it accesses
> header_ops. In case the header_ops is not valid any longer(say ipoib module
> unloaded), it will then access an invalid memory address.
> This patch try to fix this issue by clearing header_ops when last slave
> detached.
> 
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> ---
>  drivers/net/bonding/bond_main.c | 1 +
>  1 file changed, 1 insertion(+)

It seems you basically ignored my feedback.



Some code is doing :

[A] if (dev->header_ops) {
    ...
[B]    dev->header_ops->XXX()



Nothing prevents you doing the clear after [A] , and before [B]



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wengang Wang Nov. 19, 2014, 7 a.m. UTC | #3
Hi Eric,

于 2014年11月19日 13:39, Eric Dumazet 写道:
> On Wed, 2014-11-19 at 13:18 +0800, Wengang Wang wrote:
>> When last slave of a bonding master is removed, the bonding then does not work.
>> When packet_snd is called against with a master net_device, it accesses
>> header_ops. In case the header_ops is not valid any longer(say ipoib module
>> unloaded), it will then access an invalid memory address.
>> This patch try to fix this issue by clearing header_ops when last slave
>> detached.
>>
>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>> ---
>>   drivers/net/bonding/bond_main.c | 1 +
>>   1 file changed, 1 insertion(+)
> It seems you basically ignored my feedback.
>
>
>
> Some code is doing :
>
> [A] if (dev->header_ops) {
>      ...
> [B]    dev->header_ops->XXX()
>
>
>
> Nothing prevents you doing the clear after [A] , and before [B]
>

Yes, that's true. So the simplest way is move ipoib_header_ops to vmlinux.

thanks for review.
wengang
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang Nov. 19, 2014, 10:26 p.m. UTC | #4
On Tue, Nov 18, 2014 at 11:00 PM, Wengang <wen.gang.wang@oracle.com> wrote:
>
> Yes, that's true. So the simplest way is move ipoib_header_ops to vmlinux.
>

That is not an option. Perhaps you need RCU to protect the dev->header_ops
pointer.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang Nov. 19, 2014, 10:56 p.m. UTC | #5
On Wed, Nov 19, 2014 at 2:26 PM, Cong Wang <cwang@twopensource.com> wrote:
> On Tue, Nov 18, 2014 at 11:00 PM, Wengang <wen.gang.wang@oracle.com> wrote:
>>
>> Yes, that's true. So the simplest way is move ipoib_header_ops to vmlinux.
>>
>
> That is not an option. Perhaps you need RCU to protect the dev->header_ops
> pointer.

Or maybe take a refcount of that module.

BTW,  why this is a problem only for bonding? Doesn't neigh have the
same bug? Or it takes the module refcount somewhere I don't find?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Nov. 20, 2014, 6:41 a.m. UTC | #6
On Wed, 2014-11-19 at 14:26 -0800, Cong Wang wrote:
> On Tue, Nov 18, 2014 at 11:00 PM, Wengang <wen.gang.wang@oracle.com> wrote:
> >
> > Yes, that's true. So the simplest way is move ipoib_header_ops to vmlinux.
> >
> 
> That is not an option. Perhaps you need RCU to protect the dev->header_ops
> pointer.

This _is_ a reasonable option, especially for stable kernels

ipoib_hard_header() is 100 bytes or less. Adding infrastructure all over
the kernel to be able to use RCU or module refcounting will cost much
more.

Tell me why it is ok for eth_header_ops() being static (while its _much_
bigger), and not for ipoib_header_ops. This looks pretty arbitrary to
me.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang Nov. 20, 2014, 5:34 p.m. UTC | #7
On Wed, Nov 19, 2014 at 10:41 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2014-11-19 at 14:26 -0800, Cong Wang wrote:
>> On Tue, Nov 18, 2014 at 11:00 PM, Wengang <wen.gang.wang@oracle.com> wrote:
>> >
>> > Yes, that's true. So the simplest way is move ipoib_header_ops to vmlinux.
>> >
>>
>> That is not an option. Perhaps you need RCU to protect the dev->header_ops
>> pointer.
>
> This _is_ a reasonable option, especially for stable kernels
>
> ipoib_hard_header() is 100 bytes or less. Adding infrastructure all over
> the kernel to be able to use RCU or module refcounting will cost much
> more.

I didn't look into ipoib_header_ops, thought it might have some dependency
on symbols. So _generally_ speaking, we can't do that, the dependency
chain could be very long. It could be an exception and good luck
to play with module symbols.

>
> Tell me why it is ok for eth_header_ops() being static (while its _much_
> bigger), and not for ipoib_header_ops. This looks pretty arbitrary to
> me.
>

Simple, eth_* is kinda of a builtin library now, too many drivers use them.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Nov. 20, 2014, 8:41 p.m. UTC | #8
On Thu, 2014-11-20 at 09:34 -0800, Cong Wang wrote:

> I didn't look into ipoib_header_ops, thought it might have some dependency
> on symbols.

I did look before answering and suggesting this, you really should do
the same instead of giving advices of over engineering the stack.

Best is the enemy of the good.

Its hard to find some networking function trivial than this one.


static int ipoib_hard_header(struct sk_buff *skb,
                             struct net_device *dev,
                             unsigned short type,
                             const void *daddr, const void *saddr, unsigned len)
{
        struct ipoib_header *header;
        struct ipoib_cb *cb = ipoib_skb_cb(skb);

        header = (struct ipoib_header *) skb_push(skb, sizeof *header);

        header->proto = htons(type);
        header->reserved = 0;

        /*
         * we don't rely on dst_entry structure,  always stuff the
         * destination address into skb->cb so we can figure out where
         * to send the packet later.
         */
        memcpy(cb->hwaddr, daddr, INFINIBAND_ALEN);

        return sizeof *header;
}


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang Nov. 20, 2014, 9:57 p.m. UTC | #9
On Thu, Nov 20, 2014 at 12:41 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2014-11-20 at 09:34 -0800, Cong Wang wrote:
>
>> I didn't look into ipoib_header_ops, thought it might have some dependency
>> on symbols.
>
> I did look before answering and suggesting this, you really should do
> the same instead of giving advices of over engineering the stack.
>
> Best is the enemy of the good.
>
> Its hard to find some networking function trivial than this one.

What about other modules defining *header_ops? Don't they
need to move to vmlinux as well?

I still don't like this workaround even just for stable. Although
definitely a real fix could be harder to backport, for me it is normal
backport 8+ patches to stable:

http://www.spinics.net/lists/stable/msg66122.html
http://www.spinics.net/lists/linux-fsdevel/msg79967.html

I know you disagree, I don't even want to waste time on arguing it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Nov. 20, 2014, 10:03 p.m. UTC | #10
On Thu, 2014-11-20 at 13:57 -0800, Cong Wang wrote:
> On Thu, Nov 20, 2014 at 12:41 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Thu, 2014-11-20 at 09:34 -0800, Cong Wang wrote:
> >
> >> I didn't look into ipoib_header_ops, thought it might have some dependency
> >> on symbols.
> >
> > I did look before answering and suggesting this, you really should do
> > the same instead of giving advices of over engineering the stack.
> >
> > Best is the enemy of the good.
> >
> > Its hard to find some networking function trivial than this one.
> 
> What about other modules defining *header_ops? Don't they
> need to move to vmlinux as well?

Yep, if they can be in a bonding device, for practical reasons, not to
prove your point.

> 
> I still don't like this workaround even just for stable. Although
> definitely a real fix could be harder to backport, for me it is normal
> backport 8+ patches to stable:
> 
> http://www.spinics.net/lists/stable/msg66122.html
> http://www.spinics.net/lists/linux-fsdevel/msg79967.html
> 
> I know you disagree, I don't even want to waste time on arguing it.

Whatever, I really don't care.

Do your stuff, but don't ask people asking for an easy fix to do the
heart surgery.

Provide a patch, please.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang Nov. 20, 2014, 10:13 p.m. UTC | #11
On Thu, Nov 20, 2014 at 2:03 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Provide a patch, please.
>
>

Don't blame me.

I want to provide a real fix, you want a minimum fix for stable.
We agree that we disagree on this point, right? What's
more, according to your rule, I should yield to you when I
touch something you want to touch.

Also, no one seems to care about my previous question:
why only bonding has the problem?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jay Vosburgh Nov. 20, 2014, 10:53 p.m. UTC | #12
Cong Wang <cwang@twopensource.com> wrote:

>Also, no one seems to care about my previous question:
>why only bonding has the problem?

	Bonding has the problem because it stashes a pointer to a data
structure (the header_ops) from another module, and when that module is
unloaded the dangling pointer may be dereferenced if it's not either
cleared or made to never go away.

	Setting the bonding->header_ops to NULL (to avoid the current
problem with pktgen) has a race in dev_hard_header between where the
header_ops pointer is checked and where the ->create function is called.

	This pointer business is the main reason the bonding path for
"not ARPHRD_ETHER" (i.e., ipoib) has extra complexity in the open/close
path, e.g.,

bond_slave_netdev_event():
[...]
        switch (event) {
        case NETDEV_UNREGISTER:
                if (bond_dev->type != ARPHRD_ETHER)
                        bond_release_and_destroy(bond_dev, slave_dev);
                else
                        bond_release(bond_dev, slave_dev);

	If the ipoib ops were static in vmlinux, that would resolve the
pktgen problem, and also may eliminate the need for some of the ugly
bits like what I've pasted in above.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang Nov. 21, 2014, 6:17 p.m. UTC | #13
On Thu, Nov 20, 2014 at 2:53 PM, Jay Vosburgh
<jay.vosburgh@canonical.com> wrote:
> Cong Wang <cwang@twopensource.com> wrote:
>
>>Also, no one seems to care about my previous question:
>>why only bonding has the problem?
>
>         Bonding has the problem because it stashes a pointer to a data
> structure (the header_ops) from another module, and when that module is
> unloaded the dangling pointer may be dereferenced if it's not either
> cleared or made to never go away.

I knew, please re-read my question, I was asking why ONLY bonding
has the problem, i.e. why not neigh or whatever else calling
header_ops->foo()? :)

As I said, I may miss some try_get_module() somewhere of course.
Needs more digging.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jay Vosburgh Nov. 21, 2014, 6:54 p.m. UTC | #14
Cong Wang <cwang@twopensource.com> wrote:

>On Thu, Nov 20, 2014 at 2:53 PM, Jay Vosburgh
><jay.vosburgh@canonical.com> wrote:
>> Cong Wang <cwang@twopensource.com> wrote:
>>
>>>Also, no one seems to care about my previous question:
>>>why only bonding has the problem?
>>
>>         Bonding has the problem because it stashes a pointer to a data
>> structure (the header_ops) from another module, and when that module is
>> unloaded the dangling pointer may be dereferenced if it's not either
>> cleared or made to never go away.
>
>I knew, please re-read my question, I was asking why ONLY bonding
>has the problem, i.e. why not neigh or whatever else calling
>header_ops->foo()? :)
>
>As I said, I may miss some try_get_module() somewhere of course.
>Needs more digging.

	My explanation is why only bonding has the problem; it's keeping
a pointer (in bond_dev->header_ops) that is copied from the slave
device's ->header_ops, and clearing that stashed pointer is (a) not
correctly synchronized with the removal of the slave device, and (b)
trying to simply clear the pointer has a check then use race in
dev_hard_header.

	8021q, for example, uses a "passthru" header_ops to call the
underlying device's header_ops, but 8021q is only for ethernet, and the
eth_header_ops are static in vmlinux, so it won't see these problems.

	Actually, now that I think about it, when the last ipoib slave
is released, the bonding master device is theoretically supposed to be
removed to avoid the sort of problem we're discussing here.

	That apparently isn't happening, unless Wengang is running
pktgen and simultaneously removing the ipoib module (racing the transmit
against the removal), or maybe something else is going on (perhaps
pktgen holds a reference to the bonding master, preventing its removal).

	Also, curiously, looking at pkgten, pktgen_setup_dev appears to
only accept devices of type ARPHRD_ETHER, but bonding with an ipoib
slave would be ARPHRD_INFINIBAND.  I'm therefore not sure how Wengang
configured pktgen over an ipoib bond.

	Wengang, what kernel are you using, and is your kernel modified
to change pktgen_setup_dev?

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Nov. 21, 2014, 6:55 p.m. UTC | #15
From: Cong Wang <cwang@twopensource.com>
Date: Fri, 21 Nov 2014 10:17:12 -0800

> On Thu, Nov 20, 2014 at 2:53 PM, Jay Vosburgh
> <jay.vosburgh@canonical.com> wrote:
>> Cong Wang <cwang@twopensource.com> wrote:
>>
>>>Also, no one seems to care about my previous question:
>>>why only bonding has the problem?
>>
>>         Bonding has the problem because it stashes a pointer to a data
>> structure (the header_ops) from another module, and when that module is
>> unloaded the dangling pointer may be dereferenced if it's not either
>> cleared or made to never go away.
> 
> I knew, please re-read my question, I was asking why ONLY bonding
> has the problem, i.e. why not neigh or whatever else calling
> header_ops->foo()? :)

They are either static only (ipv4) or cannot be unloaded as a module
after being loaded (ipv6).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wengang Wang Nov. 24, 2014, 3:05 a.m. UTC | #16
于 2014年11月22日 02:54, Jay Vosburgh 写道:
> Cong Wang <cwang@twopensource.com> wrote:
>
>> On Thu, Nov 20, 2014 at 2:53 PM, Jay Vosburgh
>> <jay.vosburgh@canonical.com> wrote:
>>> Cong Wang <cwang@twopensource.com> wrote:
>>>
>>>> Also, no one seems to care about my previous question:
>>>> why only bonding has the problem?
>>>          Bonding has the problem because it stashes a pointer to a data
>>> structure (the header_ops) from another module, and when that module is
>>> unloaded the dangling pointer may be dereferenced if it's not either
>>> cleared or made to never go away.
>> I knew, please re-read my question, I was asking why ONLY bonding
>> has the problem, i.e. why not neigh or whatever else calling
>> header_ops->foo()? :)
>>
>> As I said, I may miss some try_get_module() somewhere of course.
>> Needs more digging.
> 	My explanation is why only bonding has the problem; it's keeping
> a pointer (in bond_dev->header_ops) that is copied from the slave
> device's ->header_ops, and clearing that stashed pointer is (a) not
> correctly synchronized with the removal of the slave device, and (b)
> trying to simply clear the pointer has a check then use race in
> dev_hard_header.
>
> 	8021q, for example, uses a "passthru" header_ops to call the
> underlying device's header_ops, but 8021q is only for ethernet, and the
> eth_header_ops are static in vmlinux, so it won't see these problems.
>
> 	Actually, now that I think about it, when the last ipoib slave
> is released, the bonding master device is theoretically supposed to be
> removed to avoid the sort of problem we're discussing here.
>
> 	That apparently isn't happening, unless Wengang is running
> pktgen and simultaneously removing the ipoib module (racing the transmit
> against the removal), or maybe something else is going on (perhaps
> pktgen holds a reference to the bonding master, preventing its removal).
>
> 	Also, curiously, looking at pkgten, pktgen_setup_dev appears to
> only accept devices of type ARPHRD_ETHER, but bonding with an ipoib
> slave would be ARPHRD_INFINIBAND.  I'm therefore not sure how Wengang
> configured pktgen over an ipoib bond.
>
> 	Wengang, what kernel are you using, and is your kernel modified
> to change pktgen_setup_dev?
>
> 	-J

It's a 2.6.39 kernel.
code is like this:

static int pktgen_setup_dev(struct pktgen_dev *pkt_dev, const char *ifname)
{
         struct net_device *odev;
         int err;

         /* Clean old setups */
         if (pkt_dev->odev) {
                 dev_put(pkt_dev->odev);
                 pkt_dev->odev = NULL;
         }

         odev = pktgen_dev_get_by_name(pkt_dev, ifname);
         if (!odev) {
                 pr_err("no such netdevice: \"%s\"\n", ifname);
                 return -ENODEV;
         }

         if (odev->type != ARPHRD_ETHER) {
                 pr_err("not an ethernet device: \"%s\"\n", ifname);
                 err = -EINVAL;
         } else if (!netif_running(odev)) {
                 pr_err("device is down: \"%s\"\n", ifname);
                 err = -ENETDOWN;
         } else {
                 pkt_dev->odev = odev;
                 return 0;
         }

         dev_put(odev);
         return err;
}

No change done to it.

This problem is a side product when I was working with another area. I 
am so far not very clear about the setup(no env to check now either).

thanks,
wengang
> ---
> 	-Jay Vosburgh, jay.vosburgh@canonical.com

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index c9ac06c..52a7e4b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1730,6 +1730,7 @@  static int __bond_release_one(struct net_device *bond_dev,
 	bond->slave_cnt--;
 
 	if (!bond_has_slaves(bond)) {
+		bond->dev->header_ops = NULL;
 		call_netdevice_notifiers(NETDEV_CHANGEADDR, bond->dev);
 		call_netdevice_notifiers(NETDEV_RELEASE, bond->dev);
 	}