diff mbox

[net-next-2.6] ipv4: sysctl to block responding on down interface

Message ID 20100611084854.0680c014@nehalam
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

stephen hemminger June 11, 2010, 3:48 p.m. UTC
When Linux is used as a router, it is undesirable for the kernel to process
incoming packets when the address assigned to the interface is down.
The initial problem report was for a management application that used ICMP
to check link availability.

The default is disabled to maintain compatibility with previous behavior.
This is not recommended for server systems because it makes fail over more
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller June 22, 2010, 5:15 p.m. UTC | #1
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Fri, 11 Jun 2010 08:48:54 -0700

> The initial problem report was for a management application that used ICMP
> to check link availability.

That application is buggy, and even if we apply this patch it will
only properly function when speaking to systems in a non-default
configuration.  And, it would be a non-default setting which, by your
own admission below, cannot function properly in valid interface
configurations.

It's easier to fix the app to work in all cases than to add another
sysctl knob hack for a segment of the world that can't seem to wrap
their head around the fact that our behavior is valid, specified, and
an explicit design decision meant to increase the chances of
successful communication between two systems.

> The default is disabled to maintain compatibility with previous behavior.
> This is not recommended for server systems because it makes fail over more
> difficult, and does not account for configurations where multiple interfaces
> have the same IP address.

The fact that the syctl knob, when enabled, can't even function properly
in this "multiple interfaces with same address" case is another reason I
have decided to not apply this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joakim Tjernlund June 28, 2010, 7:03 p.m. UTC | #2
Stephen Hemminger <shemminger@vyatta.com> wrote on 2010/06/11 17:48:54:
>
> When Linux is used as a router, it is undesirable for the kernel to process
> incoming packets when the address assigned to the interface is down.
> The initial problem report was for a management application that used ICMP
> to check link availability.
>
> The default is disabled to maintain compatibility with previous behavior.
> This is not recommended for server systems because it makes fail over more
> difficult, and does not account for configurations where multiple interfaces
> have the same IP address.
>
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Ping David et. all?
I too want this.

 Jocke

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet June 28, 2010, 7:42 p.m. UTC | #3
Le lundi 28 juin 2010 à 21:03 +0200, Joakim Tjernlund a écrit :
> Stephen Hemminger <shemminger@vyatta.com> wrote on 2010/06/11 17:48:54:
> >
> > When Linux is used as a router, it is undesirable for the kernel to process
> > incoming packets when the address assigned to the interface is down.
> > The initial problem report was for a management application that used ICMP
> > to check link availability.
> >
> > The default is disabled to maintain compatibility with previous behavior.
> > This is not recommended for server systems because it makes fail over more
> > difficult, and does not account for configurations where multiple interfaces
> > have the same IP address.
> >
> > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> Ping David et. all?
> I too want this.

You probably missed David reply

http://permalink.gmane.org/gmane.linux.network/164494



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joakim Tjernlund June 28, 2010, 9:09 p.m. UTC | #4
Eric Dumazet <eric.dumazet@gmail.com> wrote on 2010/06/28 21:42:01:
>
> Le lundi 28 juin 2010 à 21:03 +0200, Joakim Tjernlund a écrit :
> > Stephen Hemminger <shemminger@vyatta.com> wrote on 2010/06/11 17:48:54:
> > >
> > > When Linux is used as a router, it is undesirable for the kernel to process
> > > incoming packets when the address assigned to the interface is down.
> > > The initial problem report was for a management application that used ICMP
> > > to check link availability.
> > >
> > > The default is disabled to maintain compatibility with previous behavior.
> > > This is not recommended for server systems because it makes fail over more
> > > difficult, and does not account for configurations where multiple interfaces
> > > have the same IP address.
> > >
> > > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> >
> > Ping David et. all?
> > I too want this.
>
> You probably missed David reply
>
> http://permalink.gmane.org/gmane.linux.network/164494

Sure did, don't know how that happened, sorry.

Reading David's reply I do wonder about the current behaviour. Why
is it so important to keep responding to an IP address when the
admin has put the interface holding that IP address into administratively
down state? I don't think the weak host model stipulates that it must be so, does it?

To me it "ifconfig eth0 down" means not only to stop using the I/F but
also any IP address associated with the I/F. I was rather surprised that
it didn't work that way. I don't see any way to make Linux stop responding to
that IP other that removing it completely from the system, which is rather
awkward.

Note, I don't mean that the same should be applied for the No Carrier case, just
ifconfig down.

 Jocke


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mitchell Erblich June 28, 2010, 9:28 p.m. UTC | #5
On Jun 28, 2010, at 2:09 PM, Joakim Tjernlund wrote:

> Eric Dumazet <eric.dumazet@gmail.com> wrote on 2010/06/28 21:42:01:
>> 
>> Le lundi 28 juin 2010 à 21:03 +0200, Joakim Tjernlund a écrit :
>>> Stephen Hemminger <shemminger@vyatta.com> wrote on 2010/06/11 17:48:54:
>>>> 
>>>> When Linux is used as a router, it is undesirable for the kernel to process
>>>> incoming packets when the address assigned to the interface is down.
>>>> The initial problem report was for a management application that used ICMP
>>>> to check link availability.
>>>> 
>>>> The default is disabled to maintain compatibility with previous behavior.
>>>> This is not recommended for server systems because it makes fail over more
>>>> difficult, and does not account for configurations where multiple interfaces
>>>> have the same IP address.
>>>> 
>>>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>>> 
>>> Ping David et. all?
>>> I too want this.
>> 
>> You probably missed David reply
>> 
>> http://permalink.gmane.org/gmane.linux.network/164494
> 
> Sure did, don't know how that happened, sorry.
> 
> Reading David's reply I do wonder about the current behaviour. Why
> is it so important to keep responding to an IP address when the
> admin has put the interface holding that IP address into administratively
> down state? I don't think the weak host model stipulates that it must be so, does it?
> 
> To me it "ifconfig eth0 down" means not only to stop using the I/F but
> also any IP address associated with the I/F. I was rather surprised that
> it didn't work that way. I don't see any way to make Linux stop responding to
> that IP other that removing it completely from the system, which is rather
> awkward.
> 
> Note, I don't mean that the same should be applied for the No Carrier case, just
> ifconfig down.
> 
> Jocke
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hey guys, isn't the support of magic pkts/ Energy star require the receipt
of pkts while the intf is down?

Mitchell Erblich--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller June 28, 2010, 9:57 p.m. UTC | #6
From: Joakim Tjernlund <joakim.tjernlund@transmode.se>
Date: Mon, 28 Jun 2010 23:09:02 +0200

> To me it "ifconfig eth0 down" means not only to stop using the I/F
> but also any IP address associated with the I/F.

IP addresses are associated with the host, not a particular interface.

Therefore the state of the interface should not influence the behavior
of the IP address.

If you want the IP address to stop being responded to, delete the IP
address.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joakim Tjernlund June 28, 2010, 9:58 p.m. UTC | #7
Mitchell Erblich <erblichs@earthlink.net> wrote on 2010/06/28 23:28:29:
>
>
> On Jun 28, 2010, at 2:09 PM, Joakim Tjernlund wrote:
>
> > Eric Dumazet <eric.dumazet@gmail.com> wrote on 2010/06/28 21:42:01:
> >>
> >> Le lundi 28 juin 2010 à 21:03 +0200, Joakim Tjernlund a écrit :
> >>> Stephen Hemminger <shemminger@vyatta.com> wrote on 2010/06/11 17:48:54:
> >>>>
> >>>> When Linux is used as a router, it is undesirable for the kernel to process
> >>>> incoming packets when the address assigned to the interface is down.
> >>>> The initial problem report was for a management application that used ICMP
> >>>> to check link availability.
> >>>>
> >>>> The default is disabled to maintain compatibility with previous behavior.
> >>>> This is not recommended for server systems because it makes fail over more
> >>>> difficult, and does not account for configurations where multiple interfaces
> >>>> have the same IP address.
> >>>>
> >>>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> >>>
> >>> Ping David et. all?
> >>> I too want this.
> >>
> >> You probably missed David reply
> >>
> >> http://permalink.gmane.org/gmane.linux.network/164494
> >
> > Sure did, don't know how that happened, sorry.
> >
> > Reading David's reply I do wonder about the current behaviour. Why
> > is it so important to keep responding to an IP address when the
> > admin has put the interface holding that IP address into administratively
> > down state? I don't think the weak host model stipulates that it must be so, does it?
> >
> > To me it "ifconfig eth0 down" means not only to stop using the I/F but
> > also any IP address associated with the I/F. I was rather surprised that
> > it didn't work that way. I don't see any way to make Linux stop responding to
> > that IP other that removing it completely from the system, which is rather
> > awkward.
> >
> > Note, I don't mean that the same should be applied for the No Carrier case, just
> > ifconfig down.
> >
> > Jocke
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> Hey guys, isn't the support of magic pkts/ Energy star require the receipt
> of pkts while the intf is down?

No idea, but if so, does it need to process IP pkgs destined for
the IP address in question and pass these up to user space?

 Jocke

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joakim Tjernlund June 28, 2010, 11:30 p.m. UTC | #8
David Miller <davem@davemloft.net> wrote on 2010/06/28 23:57:44:
>
> From: Joakim Tjernlund <joakim.tjernlund@transmode.se>
> Date: Mon, 28 Jun 2010 23:09:02 +0200
>
> > To me it "ifconfig eth0 down" means not only to stop using the I/F
> > but also any IP address associated with the I/F.
>
> IP addresses are associated with the host, not a particular interface.
>
> Therefore the state of the interface should not influence the behavior
> of the IP address.
>
> If you want the IP address to stop being responded to, delete the IP
> address.

This is an strict interpretation of the weak host model and does not
answer my questions. Mind to elaborate why such a strict view and
what is gained by answering on an IP address which has been "downed"?
What types of apps use this property?

 Jocke

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller June 29, 2010, 3:01 a.m. UTC | #9
From: Joakim Tjernlund <joakim.tjernlund@transmode.se>
Date: Tue, 29 Jun 2010 01:30:26 +0200

> This is an strict interpretation of the weak host model and does not
> answer my questions. Mind to elaborate why such a strict view and
> what is gained by answering on an IP address which has been "downed"?

IP addresses are never "downed" just as your default route is not
"downed" when you take down an interface.

Rather, hosts are configured with an IP address and when they are so
configured they respond to it and can generate local application
sourced packets with that IP address as a source.

And what this means is that even in situations where hosts are
slightly mis-configured communication between them can still be
possible.  That's the goal of the weak host model, to get a host
respond to IP datagrams in every situation where such an act is
plausible.

All of the design decisions we've made in the networking in this area
are meant to increase the likelyhood of successful communication
between two hosts.

And in the 10+ years this behavior has existed, I know for sure that
people have ended up with a working networking because of the way we
do things.

So from that perspective it doesn't matter one iota what you or any
other particular entity wish things to be, since 10+ years of having
this behavior is ingrained enough that changing it is guarenteed to
break someone's setup so we absolutely can't do it.

This topic comes up at least once every few months, therefore someone
should post a FAQ somewhere because it's tiring to explain over and
over again why this is a good design decision and why the default
behavior is never going to change.

The RFCs allow both models equally, and just because many other
system does things the other way doesn't make it any better or more
valid than what Linux is doing.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger June 30, 2010, 8:55 p.m. UTC | #10
On Tue, 22 Jun 2010 10:15:37 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Fri, 11 Jun 2010 08:48:54 -0700
> 
> > The initial problem report was for a management application that used ICMP
> > to check link availability.
> 
> That application is buggy, and even if we apply this patch it will
> only properly function when speaking to systems in a non-default
> configuration.  And, it would be a non-default setting which, by your
> own admission below, cannot function properly in valid interface
> configurations.

It is a remote management system not a local application.
The management system is stupid, but it is hard to argue with
customers that other system is broken. 

> It's easier to fix the app to work in all cases than to add another
> sysctl knob hack for a segment of the world that can't seem to wrap
> their head around the fact that our behavior is valid, specified, and
> an explicit design decision meant to increase the chances of
> successful communication between two systems.
> 
> > The default is disabled to maintain compatibility with previous behavior.
> > This is not recommended for server systems because it makes fail over more
> > difficult, and does not account for configurations where multiple interfaces
> > have the same IP address.
> 
> The fact that the syctl knob, when enabled, can't even function properly
> in this "multiple interfaces with same address" case is another reason I
> have decided to not apply this.

We already have sysctl knobs that exist to work around broken printer TCP,
middleboxes and other broken stacks; my opinion this is just another one
of those types of workarounds.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller June 30, 2010, 8:58 p.m. UTC | #11
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 30 Jun 2010 13:55:35 -0700

>> The fact that the syctl knob, when enabled, can't even function properly
>> in this "multiple interfaces with same address" case is another reason I
>> have decided to not apply this.
> 
> We already have sysctl knobs that exist to work around broken printer TCP,
> middleboxes and other broken stacks; my opinion this is just another one
> of those types of workarounds.

But that sysctl knob for the printer workaround doesn't break legitimate
configurations like this one does.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andi Kleen July 1, 2010, 11:23 a.m. UTC | #12
Joakim Tjernlund <joakim.tjernlund@transmode.se> writes:

> Stephen Hemminger <shemminger@vyatta.com> wrote on 2010/06/11 17:48:54:
>>
>> When Linux is used as a router, it is undesirable for the kernel to process
>> incoming packets when the address assigned to the interface is down.
>> The initial problem report was for a management application that used ICMP
>> to check link availability.
>>
>> The default is disabled to maintain compatibility with previous behavior.
>> This is not recommended for server systems because it makes fail over more
>> difficult, and does not account for configurations where multiple interfaces
>> have the same IP address.
>>
>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>
> Ping David et. all?
> I too want this.

Doesn't arpfilter enable this already? If you set in on the still up
interfaces those will not answer to other IP addresses.

This only works on the ARP level, so it has to wait until the arp
cache in the remote host times out.

-Andi
Joakim Tjernlund July 1, 2010, 11:48 a.m. UTC | #13
Andi Kleen <andi@firstfloor.org> wrote on 2010/07/01 13:23:21:
>
> Joakim Tjernlund <joakim.tjernlund@transmode.se> writes:
>
> > Stephen Hemminger <shemminger@vyatta.com> wrote on 2010/06/11 17:48:54:
> >>
> >> When Linux is used as a router, it is undesirable for the kernel to process
> >> incoming packets when the address assigned to the interface is down.
> >> The initial problem report was for a management application that used ICMP
> >> to check link availability.
> >>
> >> The default is disabled to maintain compatibility with previous behavior.
> >> This is not recommended for server systems because it makes fail over more
> >> difficult, and does not account for configurations where multiple interfaces
> >> have the same IP address.
> >>
> >> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> >
> > Ping David et. all?
> > I too want this.
>
> Doesn't arpfilter enable this already? If you set in on the still up
> interfaces those will not answer to other IP addresses.
>
> This only works on the ARP level, so it has to wait until the arp
> cache in the remote host times out.

I tried that but it didn't work, but I didn't think of clearing
the ARP cache.
Anyhow, such methods seems worse than just doing ifconfig 0.0.0.0

  Jocke

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

difficult, and does not account for configurations where multiple interfaces
have the same IP address.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
 Documentation/networking/ip-sysctl.txt |   10 ++++++++++
 include/linux/inetdevice.h             |    2 ++
 net/ipv4/devinet.c                     |    1 +
 net/ipv4/route.c                       |    7 +++++++
 4 files changed, 20 insertions(+)

--- a/include/linux/inetdevice.h	2010-05-28 08:35:11.000000000 -0700
+++ b/include/linux/inetdevice.h	2010-06-11 08:35:55.237028136 -0700
@@ -37,6 +37,7 @@  enum
 	IPV4_DEVCONF_ACCEPT_LOCAL,
 	IPV4_DEVCONF_SRC_VMARK,
 	IPV4_DEVCONF_PROXY_ARP_PVLAN,
+	IPV4_DEVCONF_LINKFILTER,
 	__IPV4_DEVCONF_MAX
 };
 
@@ -140,6 +141,7 @@  static inline void ipv4_devconf_setall(s
 #define IN_DEV_ARP_ANNOUNCE(in_dev)	IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE)
 #define IN_DEV_ARP_IGNORE(in_dev)	IN_DEV_MAXCONF((in_dev), ARP_IGNORE)
 #define IN_DEV_ARP_NOTIFY(in_dev)	IN_DEV_MAXCONF((in_dev), ARP_NOTIFY)
+#define IN_DEV_LINKFILTER(in_dev)	IN_DEV_MAXCONF((in_dev), LINKFILTER)
 
 struct in_ifaddr {
 	struct in_ifaddr	*ifa_next;
--- a/net/ipv4/devinet.c	2010-06-01 08:39:12.000000000 -0700
+++ b/net/ipv4/devinet.c	2010-06-11 08:37:03.921248294 -0700
@@ -1416,6 +1416,7 @@  static struct devinet_sysctl_table {
 		DEVINET_SYSCTL_RW_ENTRY(ARP_ACCEPT, "arp_accept"),
 		DEVINET_SYSCTL_RW_ENTRY(ARP_NOTIFY, "arp_notify"),
 		DEVINET_SYSCTL_RW_ENTRY(PROXY_ARP_PVLAN, "proxy_arp_pvlan"),
+		DEVINET_SYSCTL_RW_ENTRY(LINKFILTER, "link_filter"),
 
 		DEVINET_SYSCTL_FLUSHING_ENTRY(NOXFRM, "disable_xfrm"),
 		DEVINET_SYSCTL_FLUSHING_ENTRY(NOPOLICY, "disable_policy"),
--- a/net/ipv4/route.c	2010-06-11 08:13:13.000000000 -0700
+++ b/net/ipv4/route.c	2010-06-11 08:14:28.486271886 -0700
@@ -2152,6 +2152,13 @@  static int ip_route_input_slow(struct sk
 		goto brd_input;
 
 	if (res.type == RTN_LOCAL) {
+		int linkf = IN_DEV_LINKFILTER(in_dev);
+
+		if (linkf && !netif_running(res.fi->fib_dev))
+			goto no_route;
+		if (linkf > 1 && !netif_carrier_ok(res.fi->fib_dev))
+			goto no_route;
+
 		err = fib_validate_source(saddr, daddr, tos,
 					     net->loopback_dev->ifindex,
 					     dev, &spec_dst, &itag, skb->mark);
--- a/Documentation/networking/ip-sysctl.txt	2010-06-11 08:14:46.889751310 -0700
+++ b/Documentation/networking/ip-sysctl.txt	2010-06-11 08:15:35.508471622 -0700
@@ -832,6 +832,16 @@  rp_filter - INTEGER
 	Default value is 0. Note that some distributions enable it
 	in startup scripts.
 
+link_filter - INTEGER
+        0 - Allow packets to be received for the address on this interface
+	even if interface is disabled or no carrier.
+
+	1 - Ignore packets received if interface associated with the incoming
+	address is down.
+
+	2 - Ignore packets received if interface associated with the incoming
+	address is down or has no carrier.
+
 arp_filter - BOOLEAN
 	1 - Allows you to have multiple network interfaces on the same
 	subnet, and have the ARPs for each interface be answered