diff mbox

[net-next] bonding: Support for multi function NIC devices

Message ID 1342400890-32183-1-git-send-email-anirban.chakraborty@qlogic.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Anirban Chakraborty July 16, 2012, 1:08 a.m. UTC
From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>

Add support to disable bonding of interfaces belonging to the same physical port. In
case of SRIOV or NIC partition mode, a single port of the adapter can have multiple
NIC functions. While bonding such interfaces, it is ensured that the NIC functions
belonging to the same physical port are not bonded together.

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
---
 Documentation/networking/ifenslave.c |  208 +++++++++++++++++++++++++++++++++-
 1 files changed, 207 insertions(+), 1 deletions(-)

Comments

Jay Vosburgh July 16, 2012, 1:40 a.m. UTC | #1
Anirban Chakraborty <anirban.chakraborty@qlogic.com> wrote:

>From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
>
>Add support to disable bonding of interfaces belonging to the same physical port. In
>case of SRIOV or NIC partition mode, a single port of the adapter can have multiple
>NIC functions. While bonding such interfaces, it is ensured that the NIC functions
>belonging to the same physical port are not bonded together.
>
>Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
>---
> Documentation/networking/ifenslave.c |  208 +++++++++++++++++++++++++++++++++-
> 1 files changed, 207 insertions(+), 1 deletions(-)
>
>diff --git a/Documentation/networking/ifenslave.c b/Documentation/networking/ifenslave.c
>index ac5debb..a0bdab9 100644
>--- a/Documentation/networking/ifenslave.c
>+++ b/Documentation/networking/ifenslave.c
>@@ -92,9 +92,14 @@
>  *    - 2003/12/01 - Shmulik Hen <shmulik.hen at intel dot com>
>  *	 - Code cleanup and style changes
>  *	   set version to 1.1.0
>+ *
>+ *    - 2012/07/15 - Anirban Chakraborty <anirban.chakraborty at qlogic dot com>
>+ *	 - Added support to disable bonding interfaces belonging to the
>+ *	   same physical port.
>+ *	   set version to 1.1.1

	This patch is all implemented within the ifenslave user space
program, which, to my knowledge, is not currently used by any major
distro to configure bonding.

	The configuration for bonding is typically performed by packages
such as initscripts or sysconfig, and this functionality would likely
need to go there.

	The only real use for ifenslave.c is on kernels without sysfs
compiled in.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Fastabend July 16, 2012, 4:39 a.m. UTC | #2
On 7/15/2012 6:40 PM, Jay Vosburgh wrote:
> Anirban Chakraborty <anirban.chakraborty@qlogic.com> wrote:
>
>> From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
>>
>> Add support to disable bonding of interfaces belonging to the same physical port. In
>> case of SRIOV or NIC partition mode, a single port of the adapter can have multiple
>> NIC functions. While bonding such interfaces, it is ensured that the NIC functions
>> belonging to the same physical port are not bonded together.
>>
>> Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
>> ---
>> Documentation/networking/ifenslave.c |  208 +++++++++++++++++++++++++++++++++-
>> 1 files changed, 207 insertions(+), 1 deletions(-)
>>
>> diff --git a/Documentation/networking/ifenslave.c b/Documentation/networking/ifenslave.c
>> index ac5debb..a0bdab9 100644
>> --- a/Documentation/networking/ifenslave.c
>> +++ b/Documentation/networking/ifenslave.c
>> @@ -92,9 +92,14 @@
>>   *    - 2003/12/01 - Shmulik Hen <shmulik.hen at intel dot com>
>>   *	 - Code cleanup and style changes
>>   *	   set version to 1.1.0
>> + *
>> + *    - 2012/07/15 - Anirban Chakraborty <anirban.chakraborty at qlogic dot com>
>> + *	 - Added support to disable bonding interfaces belonging to the
>> + *	   same physical port.
>> + *	   set version to 1.1.1
>
> 	This patch is all implemented within the ifenslave user space
> program, which, to my knowledge, is not currently used by any major
> distro to configure bonding.
>
> 	The configuration for bonding is typically performed by packages
> such as initscripts or sysconfig, and this functionality would likely
> need to go there.
>
> 	The only real use for ifenslave.c is on kernels without sysfs
> compiled in.
>
> 	-J
>

Also I'm not sure we need to explicitly block this.  It is clear from 
looking at 'ip' output what the topology is. And in the SR-IOV
case would this still work if the functions are direct assigned? How
about if I try to bond two stacked devices that are on the same
physical link. In both case iirc the bus info wont match up.

Seems easier to just call this a configuration error or not if for
some reason this is really what someone intended.

.John
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anirban Chakraborty July 16, 2012, 5:12 a.m. UTC | #3
On 7/15/12 9:39 PM, "John Fastabend" <john.r.fastabend@intel.com> wrote:

>On 7/15/2012 6:40 PM, Jay Vosburgh wrote:
>> Anirban Chakraborty <anirban.chakraborty@qlogic.com> wrote:
>>
>>> From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
>>>
>>> Add support to disable bonding of interfaces belonging to the same
>>>physical port. In
>>> case of SRIOV or NIC partition mode, a single port of the adapter can
>>>have multiple
>>> NIC functions. While bonding such interfaces, it is ensured that the
>>>NIC functions
>>> belonging to the same physical port are not bonded together.
>>>
>>> Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
>>> ---
>>> Documentation/networking/ifenslave.c |  208
>>>+++++++++++++++++++++++++++++++++-
>>> 1 files changed, 207 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/Documentation/networking/ifenslave.c
>>>b/Documentation/networking/ifenslave.c
>>> index ac5debb..a0bdab9 100644
>>> --- a/Documentation/networking/ifenslave.c
>>> +++ b/Documentation/networking/ifenslave.c
>>> @@ -92,9 +92,14 @@
>>>   *    - 2003/12/01 - Shmulik Hen <shmulik.hen at intel dot com>
>>>   *	 - Code cleanup and style changes
>>>   *	   set version to 1.1.0
>>> + *
>>> + *    - 2012/07/15 - Anirban Chakraborty <anirban.chakraborty at
>>>qlogic dot com>
>>> + *	 - Added support to disable bonding interfaces belonging to the
>>> + *	   same physical port.
>>> + *	   set version to 1.1.1
>>
>> 	This patch is all implemented within the ifenslave user space
>> program, which, to my knowledge, is not currently used by any major
>> distro to configure bonding.
>>
>> 	The configuration for bonding is typically performed by packages
>> such as initscripts or sysconfig, and this functionality would likely
>> need to go there.
>>
>> 	The only real use for ifenslave.c is on kernels without sysfs
>> compiled in.
>>
>> 	-J
>>
>
>Also I'm not sure we need to explicitly block this.  It is clear from
>looking at 'ip' output what the topology is. And in the SR-IOV
>case would this still work if the functions are direct assigned? How
>about if I try to bond two stacked devices that are on the same
>physical link. In both case iirc the bus info wont match up.
>
>Seems easier to just call this a configuration error or not if for
>some reason this is really what someone intended.
>
>.John

I agree that for SR-IOV case with VFs assigned directly to the guest, bus
info won't
match up. However, I was thinking from the point of view of NIC
partitioned mode (NPAR),
and for the use case of SR-IOV VFs assigned to the hypervisor. It would be
nice to
prevent the user from getting into misconfiguration.

-Anirban


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jay Vosburgh July 16, 2012, 5:50 a.m. UTC | #4
Anirban Chakraborty <anirban.chakraborty@qlogic.com> wrote:
>On 7/15/12 9:39 PM, "John Fastabend" <john.r.fastabend@intel.com> wrote:
[...]
>>Also I'm not sure we need to explicitly block this.  It is clear from
>>looking at 'ip' output what the topology is. And in the SR-IOV
>>case would this still work if the functions are direct assigned? How
>>about if I try to bond two stacked devices that are on the same
>>physical link. In both case iirc the bus info wont match up.
>>
>>Seems easier to just call this a configuration error or not if for
>>some reason this is really what someone intended.
>>
>>.John
>
>I agree that for SR-IOV case with VFs assigned directly to the guest, bus
>info won't
>match up. However, I was thinking from the point of view of NIC
>partitioned mode (NPAR),
>and for the use case of SR-IOV VFs assigned to the hypervisor. It would be
>nice to
>prevent the user from getting into misconfiguration.

	If I'm understanding correctly, to hit the case you're worried
about here would require assigning multiple VFs from one PF to the same
linux instance as the PF itself, and then bonding those VFs together.

	Heck, there might be some arcane reason that somebody wants to
do that on purpose, or the test may inadvertently prohibit legal
configurations that happen to match the criteria.

	Has this been a real problem in practice?  I'm not seeing a
compelling argument for doing this.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anirban Chakraborty July 16, 2012, 6:10 a.m. UTC | #5
On 7/15/12 10:50 PM, "Jay Vosburgh" <fubar@us.ibm.com> wrote:

>Anirban Chakraborty <anirban.chakraborty@qlogic.com> wrote:
>>On 7/15/12 9:39 PM, "John Fastabend" <john.r.fastabend@intel.com> wrote:
>[...]
>>>Also I'm not sure we need to explicitly block this.  It is clear from
>>>looking at 'ip' output what the topology is. And in the SR-IOV
>>>case would this still work if the functions are direct assigned? How
>>>about if I try to bond two stacked devices that are on the same
>>>physical link. In both case iirc the bus info wont match up.
>>>
>>>Seems easier to just call this a configuration error or not if for
>>>some reason this is really what someone intended.
>>>
>>>.John
>>
>>I agree that for SR-IOV case with VFs assigned directly to the guest, bus
>>info won't
>>match up. However, I was thinking from the point of view of NIC
>>partitioned mode (NPAR),
>>and for the use case of SR-IOV VFs assigned to the hypervisor. It would
>>be
>>nice to
>>prevent the user from getting into misconfiguration.
>
>	If I'm understanding correctly, to hit the case you're worried
>about here would require assigning multiple VFs from one PF to the same
>linux instance as the PF itself, and then bonding those VFs together.
>
>	Heck, there might be some arcane reason that somebody wants to
>do that on purpose, or the test may inadvertently prohibit legal
>configurations that happen to match the criteria.
>
>	Has this been a real problem in practice?  I'm not seeing a
>compelling argument for doing this.
>
>	-J

The real problem that we have faced so far is the case of NPAR, where
multiple
physical functions are assigned to the linux host and the customer tried
to create
a bond of interfaces from the same physical port. In this case, linux host
was running
without any guest OS and the NICs are used for carrying host traffic only.

-Anirban


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/networking/ifenslave.c b/Documentation/networking/ifenslave.c
index ac5debb..a0bdab9 100644
--- a/Documentation/networking/ifenslave.c
+++ b/Documentation/networking/ifenslave.c
@@ -92,9 +92,14 @@ 
  *    - 2003/12/01 - Shmulik Hen <shmulik.hen at intel dot com>
  *	 - Code cleanup and style changes
  *	   set version to 1.1.0
+ *
+ *    - 2012/07/15 - Anirban Chakraborty <anirban.chakraborty at qlogic dot com>
+ *	 - Added support to disable bonding interfaces belonging to the
+ *	   same physical port.
+ *	   set version to 1.1.1
  */
 
-#define APP_VERSION	"1.1.0"
+#define APP_VERSION	"1.1.1"
 #define APP_RELDATE	"December 1, 2003"
 #define APP_NAME	"ifenslave"
 
@@ -111,6 +116,10 @@  static const char *usage_msg =
 "       ifenslave -c   <master-if> <slave-if>\n"
 "       ifenslave --help\n";
 
+static const char *misconfig_msg =
+"Use interfaces from different physical port for an ethernet adapter\n"
+"which has multiple NIC functions belonging to the same physical port\n";
+
 static const char *help_msg =
 "\n"
 "       To create a bond device, simply follow these three steps :\n"
@@ -208,6 +217,18 @@  struct dev_ifr {
 	int req_type;
 };
 
+/* port: physical port
+ * bus: PCI bus no.
+ * ifname: interface name
+ * driver: driver for this device
+ */
+struct dev_prop {
+	u8	port;
+	u16	bus;
+	char	ifname[IFNAMSIZ];
+	char	driver[IFNAMSIZ];
+};
+
 struct dev_ifr master_ifra[] = {
 	{&master_mtu,     "SIOCGIFMTU",     SIOCGIFMTU},
 	{&master_flags,   "SIOCGIFFLAGS",   SIOCGIFFLAGS},
@@ -224,6 +245,10 @@  struct dev_ifr slave_ifra[] = {
 
 static void if_print(char *ifname);
 static int get_drv_info(char *master_ifname);
+static int get_bus_info(struct dev_prop *interface);
+static int get_device_info(int count, struct dev_prop *prop, char *slaves[]);
+static int validate_slaves(char *master, char **spp);
+static int valid_slaves(int count, char *interfaces[]);
 static int get_if_settings(char *ifname, struct dev_ifr ifra[]);
 static int get_slave_flags(char *slave_ifname);
 static int set_master_hwaddr(char *master_ifname, struct sockaddr *hwaddr);
@@ -335,6 +360,15 @@  int main(int argc, char *argv[])
 		goto out;
 	}
 
+	/* validate the slave configuration */
+	if (!opt_d) {
+		res = validate_slaves(master_ifname, spp);
+		if (res) {
+			fprintf(stderr, "%s\n", misconfig_msg);
+			goto out;
+		}
+	}
+
 	slave_ifname = *spp++;
 
 	if (slave_ifname == NULL) {
@@ -643,6 +677,178 @@  out:
 	return 0;
 }
 
+/*
+ * Validate if specified interfaces do not belong to the same physical port
+ */
+static int validate_slaves(char *master, char **spp)
+{
+	int i, count = 0, res = 0;
+	struct ifreq ifr;
+	ifbond bond;
+	ifslave slv;
+	char *bslave;
+	char **slaves, **islaves, **tslaves;
+
+	/* Find a count for existing slave interfaces */
+	memset(&ifr, 0, sizeof(ifr));
+	ifr.ifr_data = (ifbond *)&bond;
+	strncpy(ifr.ifr_name, master, IFNAMSIZ);
+	if (ioctl(skfd, SIOCBONDINFOQUERY, &ifr) < 0) {
+		if (errno == EOPNOTSUPP) {
+			saved_errno = errno;
+			return 1;
+		}
+	}
+	islaves = spp;
+	while (*islaves++ != NULL)
+		count++;
+	if (!count)
+		return 0;
+	count += bond.num_slaves;
+	slaves = malloc(sizeof(char) * count * IFNAMSIZ);
+	if (slaves == NULL)
+		return 1;
+	memset(slaves, 0, (sizeof(char) * count * IFNAMSIZ));
+	tslaves = slaves;
+	/* find new interface names */
+	islaves = spp;
+
+	while (*islaves != NULL)
+		memcpy(slaves++, islaves++, IFNAMSIZ);
+	/* find existing slave interface names */
+	for (i = 0; i < bond.num_slaves; i++) {
+		memset(&slv, 0, sizeof(slv));
+		ifr.ifr_data = (ifslave *)&slv;
+		slv.slave_id = i;
+		if (ioctl(skfd, SIOCBONDSLAVEINFOQUERY, &ifr) < 0) {
+			if (errno == EOPNOTSUPP) {
+				saved_errno = errno;
+				res = 1;
+				goto out;
+			}
+		}
+		bslave = slv.slave_name;
+		memcpy(slaves++, &bslave, IFNAMSIZ);
+	}
+	res = valid_slaves(count, tslaves);
+out:
+	if (tslaves)
+		free(tslaves);
+	return res;
+}
+
+static int get_bus_info(struct dev_prop *interface)
+{
+	struct ifreq ifr;
+	struct ethtool_drvinfo info;
+	char *buf[4], *token, *tmp, *end;
+	int i = 0;
+
+	memset(&ifr, 0, sizeof(ifr));
+	strncpy(ifr.ifr_name, interface->ifname, IFNAMSIZ);
+	ifr.ifr_data = (caddr_t)&info;
+
+	info.cmd = ETHTOOL_GDRVINFO;
+	if (ioctl(skfd, SIOCETHTOOL, &ifr) < 0) {
+		if (errno == EOPNOTSUPP)
+			goto out;
+		saved_errno = errno;
+		v_print("Slave '%s': Error: get bonding info failed %s\n",
+			interface->ifname, strerror(saved_errno));
+		return 1;
+	}
+	memcpy(interface->driver, info.driver, strlen(info.driver) + 1);
+	token = strtok_r(info.bus_info, " :", &tmp);
+	buf[i] =  token;
+	while (token) {
+		token = strtok_r(NULL, " :.", &tmp);
+		buf[++i] = token;
+	}
+	interface->bus = strtoul(buf[1], &end, 16);
+	return 0;
+out:
+	return 1;
+}
+
+static int get_device_info(int count, struct dev_prop *prop, char *slaves[])
+{
+	int ret = -1, i;
+	struct ifreq ifr;
+	struct ethtool_cmd info;
+
+	for (i = 0; i < count; i++) {
+		strncpy(prop[i].ifname, slaves[i], IFNAMSIZ);
+		memset(&ifr, 0, sizeof(ifr));
+		strncpy(ifr.ifr_name, prop[i].ifname, IFNAMSIZ);
+		ifr.ifr_data = (caddr_t)&info;
+
+		info.cmd = ETHTOOL_GSET;
+
+		if (ioctl(skfd, SIOCETHTOOL, &ifr) < 0) {
+			if (errno == EOPNOTSUPP)
+				goto out;
+			saved_errno = errno;
+			v_print("Slave '%s': Error: get bonding info failed"
+				" %s\n", prop[i].ifname,
+				strerror(saved_errno));
+			ret = 1;
+			goto out;
+		}
+		prop[i].port = info.phy_address;
+		ret = get_bus_info(&prop[i]);
+		if (ret)
+			goto out;
+	}
+out:
+	return ret;
+}
+
+/* For a given set of interfaces, find out if they belong to the
+ * same physical port. Return true if two interfaces are found to
+ * be from same physical port, otherwise return false.
+ */
+
+static int valid_slaves(int count, char *slaves[])
+{
+	int i, j, ret = 0;
+	struct dev_prop *ifprop, *searchif;
+	struct dev_prop *prop;
+
+	prop = malloc(count * sizeof(struct dev_prop));
+	if (prop == NULL)
+		return 1;
+
+	memset(prop, 0, sizeof(struct dev_prop) * count);
+	ret = get_device_info(count, prop, slaves);
+	/* Iterate over the array of interfaces and find a match */
+	for (j = 0; j < count; j++) {
+		ifprop = &prop[j];
+		for (i = j + 1; i < count; i++) {
+			searchif = &prop[i];
+			/* Compare driver names */
+			if (!strncmp(ifprop->driver, searchif->driver, IFNAMSIZ)
+				&& !strncmp(ifprop->ifname, searchif->ifname,
+				IFNAMSIZ))
+				continue;
+			/* Compare physical port and bus of the interfaces */
+			if ((searchif->bus == ifprop->bus) &&
+				(searchif->port == ifprop->port)) {
+				ret = 1;
+				fprintf(stderr,
+					"slave interfaces %s and %s "
+					"belong to the same physical "
+					"port of the adapter\n",
+					searchif->ifname, ifprop->ifname);
+				goto out;
+			}
+		}
+	}
+out:
+	free(prop);
+	prop = NULL;
+	return ret;
+}
+
 static int change_active(char *master_ifname, char *slave_ifname)
 {
 	struct ifreq ifr;