diff mbox

Multicast socket option

Message ID 4A1EC33E.1080201@us.ibm.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Nivedita Singhvi May 28, 2009, 5 p.m. UTC
After some discussion offline with Christoph Lameter and David Stevens 
regarding multicast behaviour in Linux, I'm submitting a slightly
modified patch from the one Christoph submitted earlier.

This patch provides a new socket option IP_MULTICAST_ALL.

In this case, default behaviour is _unchanged_ from the current
Linux standard. The socket option is set by default to provide 
original behaviour. Sockets wishing to receive data only from 
multicast groups they join explicitly will need to clear this 
socket option. 


Signed-off-by: Nivedita Singhvi <niv@us.ibm.com>
Signed-off-by: Christoph Lameter<cl@linux.com>
Acked-by: David Stevens <dlstevens@us.ibm.com>



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Rémi Denis-Courmont May 29, 2009, 6:25 a.m. UTC | #1
On Thursday 28 May 2009 20:00:46 ext Nivedita Singhvi wrote:
> After some discussion offline with Christoph Lameter and David Stevens
> regarding multicast behaviour in Linux, I'm submitting a slightly
> modified patch from the one Christoph submitted earlier.
>
> This patch provides a new socket option IP_MULTICAST_ALL.
>
> In this case, default behaviour is _unchanged_ from the current
> Linux standard. The socket option is set by default to provide
> original behaviour. Sockets wishing to receive data only from
> multicast groups they join explicitly will need to clear this
> socket option.

You can already achieve this by checking the destination address in the 
SOL_PKTINFO ancilliary data. Sure, it will cause extra context switches to 
process unwanted packets but it will work with any kernel version.
Neil Horman May 29, 2009, 10:35 a.m. UTC | #2
On Thu, May 28, 2009 at 10:00:46AM -0700, Nivedita Singhvi wrote:
> After some discussion offline with Christoph Lameter and David Stevens  
> regarding multicast behaviour in Linux, I'm submitting a slightly
> modified patch from the one Christoph submitted earlier.
>
> This patch provides a new socket option IP_MULTICAST_ALL.
>
> In this case, default behaviour is _unchanged_ from the current
> Linux standard. The socket option is set by default to provide original 
> behaviour. Sockets wishing to receive data only from multicast groups 
> they join explicitly will need to clear this socket option. 
>
>
> Signed-off-by: Nivedita Singhvi <niv@us.ibm.com>
> Signed-off-by: Christoph Lameter<cl@linux.com>
> Acked-by: David Stevens <dlstevens@us.ibm.com>
>
>
> diff -urN linux-2.6.29.2/include/linux/in.h linux-2.6.29.2.new/include/linux/in.h
> --- linux-2.6.29.2/include/linux/in.h	2009-04-27 13:37:11.000000000 -0400
> +++ linux-2.6.29.2.new/include/linux/in.h	2009-05-18 16:22:06.000000000 -0400
> @@ -107,6 +107,7 @@
> #define MCAST_JOIN_SOURCE_GROUP		46
> #define MCAST_LEAVE_SOURCE_GROUP	47
> #define MCAST_MSFILTER			48
> +#define IP_MULTICAST_ALL		49
>
> #define MCAST_EXCLUDE	0
> #define MCAST_INCLUDE	1
> diff -urN linux-2.6.29.2/include/net/inet_sock.h linux-2.6.29.2.new/include/net/inet_sock.h
> --- linux-2.6.29.2/include/net/inet_sock.h	2009-04-27 13:37:11.000000000 -0400
> +++ linux-2.6.29.2.new/include/net/inet_sock.h	2009-05-18 16:22:06.000000000 -0400
> @@ -130,7 +130,8 @@
> 				freebind:1,
> 				hdrincl:1,
> 				mc_loop:1,
> -				transparent:1;
> +				transparent:1,
> +				mc_all:1;
> 	int			mc_index;
> 	__be32			mc_addr;
> 	struct ip_mc_socklist	*mc_list;
> diff -urN linux-2.6.29.2/net/ipv4/af_inet.c linux-2.6.29.2.new/net/ipv4/af_inet.c
> --- linux-2.6.29.2/net/ipv4/af_inet.c	2009-04-27 13:37:11.000000000 -0400
> +++ linux-2.6.29.2.new/net/ipv4/af_inet.c	2009-05-18 16:22:06.000000000 -0400
> @@ -376,6 +376,7 @@
> 	inet->uc_ttl	= -1;
> 	inet->mc_loop	= 1;
> 	inet->mc_ttl	= 1;
> +	inet->mc_all	= 1;
> 	inet->mc_index	= 0;
> 	inet->mc_list	= NULL;
>
> diff -urN linux-2.6.29.2/net/ipv4/igmp.c linux-2.6.29.2.new/net/ipv4/igmp.c
> --- linux-2.6.29.2/net/ipv4/igmp.c	2009-04-27 13:37:11.000000000 -0400
> +++ linux-2.6.29.2.new/net/ipv4/igmp.c	2009-05-18 16:22:33.000000000 -0400
> @@ -2196,7 +2196,7 @@
> 			break;
> 	}
> 	if (!pmc)
> -		return 1;
> +		return inet->mc_all;

This change also filters out broadcasts sent to sockets listening on a given
port.  Is that what you want?

Neil

>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter May 29, 2009, 2:12 p.m. UTC | #3
On Fri, 29 May 2009, Neil Horman wrote:

> > diff -urN linux-2.6.29.2/net/ipv4/igmp.c linux-2.6.29.2.new/net/ipv4/igmp.c
> > --- linux-2.6.29.2/net/ipv4/igmp.c	2009-04-27 13:37:11.000000000 -0400
> > +++ linux-2.6.29.2.new/net/ipv4/igmp.c	2009-05-18 16:22:33.000000000 -0400
> > @@ -2196,7 +2196,7 @@
> > 			break;
> > 	}
> > 	if (!pmc)
> > -		return 1;
> > +		return inet->mc_all;
>
> This change also filters out broadcasts sent to sockets listening on a given
> port.  Is that what you want?

Look at the beginning of the function:

   if (!ipv4_is_multicast(loc_addr))
                return 1;


will return 1 for broadcast.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nivedita Singhvi May 29, 2009, 2:21 p.m. UTC | #4
Rémi Denis-Courmont wrote:

>> In this case, default behaviour is _unchanged_ from the current
>> Linux standard. The socket option is set by default to provide
>> original behaviour. Sockets wishing to receive data only from
>> multicast groups they join explicitly will need to clear this
>> socket option.
> 
> You can already achieve this by checking the destination address in the 
> SOL_PKTINFO ancilliary data. Sure, it will cause extra context switches to 
> process unwanted packets but it will work with any kernel version.
> 

True, and depending on your environment and workload, this
is a non-event or a big deal. For low latency, real time
and other performance-sensitive applications, this can be
an issue. Think lots (hundreds of threads, perhaps) listening
on different channels(groups). They will likely all get
woken, scheduled, possibly preempt running lower-priority
tasks to process this delivery that they don't need. Seen
across the system, the delivery by the kernel of packets
to user space processes which don't want them can be very
significant overhead (context switches, scheduling out/in,...).

thanks,
Nivedita



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nivedita Singhvi May 29, 2009, 3:06 p.m. UTC | #5
Neil Horman wrote:

>> 	if (!pmc)
>> -		return 1;
>> +		return inet->mc_all;
> 
> This change also filters out broadcasts sent to sockets listening on a given
> port.  Is that what you want?

Yes, as Christoph pointed out, this does not impact
broadcast delivery, which will remain consistent with
original behaviour.

thanks,
Nivedita
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller June 2, 2009, 7:45 a.m. UTC | #6
From: Nivedita Singhvi <niv@us.ibm.com>
Date: Thu, 28 May 2009 10:00:46 -0700

> After some discussion offline with Christoph Lameter and David Stevens
> regarding multicast behaviour in Linux, I'm submitting a slightly
> modified patch from the one Christoph submitted earlier.
> 
> This patch provides a new socket option IP_MULTICAST_ALL.
> 
> In this case, default behaviour is _unchanged_ from the current
> Linux standard. The socket option is set by default to provide
> original behaviour. Sockets wishing to receive data only from
> multicast groups they join explicitly will need to clear this socket
> option.
> 
> 
> Signed-off-by: Nivedita Singhvi <niv@us.ibm.com>
> Signed-off-by: Christoph Lameter<cl@linux.com>
> Acked-by: David Stevens <dlstevens@us.ibm.com>

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff -urN linux-2.6.29.2/include/linux/in.h linux-2.6.29.2.new/include/linux/in.h
--- linux-2.6.29.2/include/linux/in.h	2009-04-27 13:37:11.000000000 -0400
+++ linux-2.6.29.2.new/include/linux/in.h	2009-05-18 16:22:06.000000000 -0400
@@ -107,6 +107,7 @@ 
 #define MCAST_JOIN_SOURCE_GROUP		46
 #define MCAST_LEAVE_SOURCE_GROUP	47
 #define MCAST_MSFILTER			48
+#define IP_MULTICAST_ALL		49
 
 #define MCAST_EXCLUDE	0
 #define MCAST_INCLUDE	1
diff -urN linux-2.6.29.2/include/net/inet_sock.h linux-2.6.29.2.new/include/net/inet_sock.h
--- linux-2.6.29.2/include/net/inet_sock.h	2009-04-27 13:37:11.000000000 -0400
+++ linux-2.6.29.2.new/include/net/inet_sock.h	2009-05-18 16:22:06.000000000 -0400
@@ -130,7 +130,8 @@ 
 				freebind:1,
 				hdrincl:1,
 				mc_loop:1,
-				transparent:1;
+				transparent:1,
+				mc_all:1;
 	int			mc_index;
 	__be32			mc_addr;
 	struct ip_mc_socklist	*mc_list;
diff -urN linux-2.6.29.2/net/ipv4/af_inet.c linux-2.6.29.2.new/net/ipv4/af_inet.c
--- linux-2.6.29.2/net/ipv4/af_inet.c	2009-04-27 13:37:11.000000000 -0400
+++ linux-2.6.29.2.new/net/ipv4/af_inet.c	2009-05-18 16:22:06.000000000 -0400
@@ -376,6 +376,7 @@ 
 	inet->uc_ttl	= -1;
 	inet->mc_loop	= 1;
 	inet->mc_ttl	= 1;
+	inet->mc_all	= 1;
 	inet->mc_index	= 0;
 	inet->mc_list	= NULL;
 
diff -urN linux-2.6.29.2/net/ipv4/igmp.c linux-2.6.29.2.new/net/ipv4/igmp.c
--- linux-2.6.29.2/net/ipv4/igmp.c	2009-04-27 13:37:11.000000000 -0400
+++ linux-2.6.29.2.new/net/ipv4/igmp.c	2009-05-18 16:22:33.000000000 -0400
@@ -2196,7 +2196,7 @@ 
 			break;
 	}
 	if (!pmc)
-		return 1;
+		return inet->mc_all;
 	psl = pmc->sflist;
 	if (!psl)
 		return pmc->sfmode == MCAST_EXCLUDE;
diff -urN linux-2.6.29.2/net/ipv4/ip_sockglue.c linux-2.6.29.2.new/net/ipv4/ip_sockglue.c
--- linux-2.6.29.2/net/ipv4/ip_sockglue.c	2009-04-27 13:37:11.000000000 -0400
+++ linux-2.6.29.2.new/net/ipv4/ip_sockglue.c	2009-05-18 16:22:06.000000000 -0400
@@ -449,6 +449,7 @@ 
 			     (1<<IP_ROUTER_ALERT) | (1<<IP_FREEBIND) |
 			     (1<<IP_PASSSEC) | (1<<IP_TRANSPARENT))) ||
 	    optname == IP_MULTICAST_TTL ||
+	    optname == IP_MULTICAST_ALL ||
 	    optname == IP_MULTICAST_LOOP ||
 	    optname == IP_RECVORIGDSTADDR) {
 		if (optlen >= sizeof(int)) {
@@ -895,6 +896,13 @@ 
 		kfree(gsf);
 		break;
 	}
+	case IP_MULTICAST_ALL:
+		if (optlen<1)
+			goto e_inval;
+		if (val != 0 && val != 1)
+			goto e_inval;
+		inet->mc_all = val;
+		break;
 	case IP_ROUTER_ALERT:
 		err = ip_ra_control(sk, val ? 1 : 0, NULL);
 		break;
@@ -1147,6 +1155,9 @@ 
 		release_sock(sk);
 		return err;
 	}
+	case IP_MULTICAST_ALL:
+		val = inet->mc_all;
+		break;
 	case IP_PKTOPTIONS:
 	{
 		struct msghdr msg;