diff mbox

[4/5] virtio_net: Add a MAC filter table

Message ID 20090116211334.22836.72681.stgit@debian.lart
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Alex Williamson Jan. 16, 2009, 9:13 p.m. UTC
Make use of the MAC_TABLE control virtqueue class to support a
MAC filter table.  The size of the filter table defaults to 16
entries and can be adjusted via the mac_entries module parameter.
Note, the original hardware address does not count towards this.

As with most real hardware, unicast addresses have priority in
the filter table so we can avoid enabling full promiscuous
until both unicast and multicast address overflow.

Signed-off-by: Alex Williamson <alex.williamson@hp.com
---

 drivers/net/virtio_net.c   |   82 +++++++++++++++++++++++++++++++++++++++++++-
 include/linux/virtio_net.h |   20 +++++++++++
 2 files changed, 100 insertions(+), 2 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Mark McLoughlin Jan. 19, 2009, 9:33 a.m. UTC | #1
On Fri, 2009-01-16 at 14:13 -0700, Alex Williamson wrote:
> Make use of the MAC_TABLE control virtqueue class to support a
> MAC filter table.  The size of the filter table defaults to 16
> entries and can be adjusted via the mac_entries module parameter.
> Note, the original hardware address does not count towards this.
> 
> As with most real hardware, unicast addresses have priority in
> the filter table so we can avoid enabling full promiscuous
> until both unicast and multicast address overflow.
> 
> Signed-off-by: Alex Williamson <alex.williamson@hp.com

Acked-by: Mark McLoughlin <markmc@redhat.com>

Cheers,
Mark.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Jan. 27, 2009, 3:38 a.m. UTC | #2
Hi Rusty,

On Tue, 2009-01-27 at 13:00 +1030, Rusty Russell wrote:
> On Saturday 17 January 2009 07:43:34 Alex Williamson wrote:
> > As with most real hardware, unicast addresses have priority in
> 
> > the filter table so we can avoid enabling full promiscuous
> 
> > until both unicast and multicast address overflow.
> 
> Why not pretend to have infinite, and have the host turn promisc on
> when *it*
> 
> decides? Skip the alloc call, and just use a feature bit like
> everything else?

I suppose it's just a matter of where do you want to add the smarts and
the tune-ability.  Since we can't actually have an infinite table and an
array implementation seems to make sense from an efficiency standpoint,
it needs to be defined by someone to be a fixed size before we start
using it.  I was hoping the guest driver might have a better idea how it
plans to use the filter table and that there'd be some benefit to having
that handshake happen between the driver and the backend.  The module
parameter fell out of this and seems rather convenient.

I could pursue this is you like, but I'm not sure of the benefit,
particularly if we want to give the user some control of the size of the
actual table.  Thoughts?  Thanks for the comments,

Alex
Rusty Russell Jan. 28, 2009, 10:45 a.m. UTC | #3
On Tuesday 27 January 2009 14:08:02 Alex Williamson wrote:
> Hi Rusty,
> 
> On Tue, 2009-01-27 at 13:00 +1030, Rusty Russell wrote:
> > On Saturday 17 January 2009 07:43:34 Alex Williamson wrote:
> > > As with most real hardware, unicast addresses have priority in
> > 
> > > the filter table so we can avoid enabling full promiscuous
> > 
> > > until both unicast and multicast address overflow.
> > 
> > Why not pretend to have infinite, and have the host turn promisc on
> > when *it*
> > 
> > decides? Skip the alloc call, and just use a feature bit like
> > everything else?
> 
> I suppose it's just a matter of where do you want to add the smarts and
> the tune-ability.  Since we can't actually have an infinite table and an
> array implementation seems to make sense from an efficiency standpoint,
> it needs to be defined by someone to be a fixed size before we start
> using it.  I was hoping the guest driver might have a better idea how it
> plans to use the filter table and that there'd be some benefit to having
> that handshake happen between the driver and the backend.  The module
> parameter fell out of this and seems rather convenient.
> 
> I could pursue this is you like, but I'm not sure of the benefit,
> particularly if we want to give the user some control of the size of the
> actual table.  Thoughts?  Thanks for the comments,

I don't think the either-or case is real.  Say the user decides they want a table of 1000000 entries.  And the backend says "no way, I have a 16 array"?  Currently you get nothing.

I guess your qemu code does dynamic allocation.  But I'm sure you put a limit in there, right? :)

We don't want some complex negotiation, and I don't think the guest has any more clue than the host, nor can do much about it.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Jan. 28, 2009, 5:48 p.m. UTC | #4
Hi Rusty,

On Wed, 2009-01-28 at 21:15 +1030, Rusty Russell wrote:
> On Tuesday 27 January 2009 14:08:02 Alex Williamson wrote:
> > Hi Rusty,
> > 
> > On Tue, 2009-01-27 at 13:00 +1030, Rusty Russell wrote:
> > > On Saturday 17 January 2009 07:43:34 Alex Williamson wrote:
> > > > As with most real hardware, unicast addresses have priority in
> > > 
> > > > the filter table so we can avoid enabling full promiscuous
> > > 
> > > > until both unicast and multicast address overflow.
> > > 
> > > Why not pretend to have infinite, and have the host turn promisc on
> > > when *it*
> > > 
> > > decides? Skip the alloc call, and just use a feature bit like
> > > everything else?
> > 
> > I suppose it's just a matter of where do you want to add the smarts and
> > the tune-ability.  Since we can't actually have an infinite table and an
> > array implementation seems to make sense from an efficiency standpoint,
> > it needs to be defined by someone to be a fixed size before we start
> > using it.  I was hoping the guest driver might have a better idea how it
> > plans to use the filter table and that there'd be some benefit to having
> > that handshake happen between the driver and the backend.  The module
> > parameter fell out of this and seems rather convenient.
> > 
> > I could pursue this is you like, but I'm not sure of the benefit,
> > particularly if we want to give the user some control of the size of the
> > actual table.  Thoughts?  Thanks for the comments,
> 
> I don't think the either-or case is real.  Say the user decides they
> want a table of 1000000 entries.  And the backend says "no way, I have
> a 16 array"?  Currently you get nothing.

Ok, I think I see what you're getting at.  Linux can handle that, but
other OSes may not...

> I guess your qemu code does dynamic allocation.  But I'm sure you put
> a limit in there, right? :)

Right, it allocates the table only when the alloc call is made.  The
limit is currently what will fit in a host page size, which is perhaps a
little arbitrary, but I had some FUD about multi-page sg lists.  The
resulting 682 entry (on 4k) limit seemed sufficiently large.

> We don't want some complex negotiation, and I don't think the guest
> has any more clue than the host, nor can do much about it.

For a typical Linux guest, I agree, and I would guess that most users
won't know or care about the module option to specify the size, nor
should qemu ever balk at allocating the default size Linux requests.
However, I also know of a more embedded, proprietary OS that lives in a
fairly rigid network environment and does have a specific number of
entries they're looking for.  Other OSes might have different numbers.

Here's what I believe to be the parameters around which I've designed
the current interface:

     A. Receiving unwanted packets is a waste of resources.
     B. Proprietary OSes may have dependencies on hardware filtering and
        may not be able to handle unwanted packets.
     C. Different guests may want different filter table sizes.
     D. The guest can make better decisions than the host about what
        packets are unwanted.

>From this, I derive that the guest needs to know the size of the filter
table, needs to be able to specify the size of the filter table, and is
in the best position to program the filter table.  A pseudo-infinite
table violates the condition of not sending the guest unwanted packets.
We could also go down the path of deciding what a "big enough" table is,
and making it part of the ABI to avoid the alloc, but then we may have
to revise the ABI if we underestimate (whereas the current limit is an
implementation detail).

Weaving in your comments from other parts of this series, would it help
at all to distinguish EINVAL from ENOMEM for the alloc call?  Maybe the
caller could make some kind of intelligent decision based on that (ex.
unimplemented feature vs try something smaller).  Thanks for the
comments,

Alex
Rusty Russell Jan. 28, 2009, 11:55 p.m. UTC | #5
On Thursday 29 January 2009 04:18:28 Alex Williamson wrote:
> Hi Rusty,

Hi Alex,

   I've cc'd Herbert: he always has good thoughts about this kind of thing and I want to be sure you're getting a fair hearing.

> Here's what I believe to be the parameters around which I've designed
> the current interface:
> 
>      A. Receiving unwanted packets is a waste of resources.
>      B. Proprietary OSes may have dependencies on hardware filtering and
>         may not be able to handle unwanted packets.
>      C. Different guests may want different filter table sizes.
>      D. The guest can make better decisions than the host about what
>         packets are unwanted.

A general OS has to handle the "unwanted packets" case if the NIC's table isn't big enough.  A guest may want arbitrary filter table sizes, but you're not giving it to them anyway, so that argument is bogus too.

The host *has* to decide the max table size, the guest *doesn't*.  Your implementation is the worst of both worlds.  The host doesn't tell the guest what the limit is, but does fail it for exceeding it.  Your guest implementation limits itself arbitrarily, but you feel better because you've invoked yet-another party (the admin of the user box) to fix it!

And there's no evidence whatsoever that the guest can do anything rational if it hits the filtering limit that the host can't do.

> >From this, I derive that the guest needs to know the size of the filter
> table, needs to be able to specify the size of the filter table, and is
> in the best position to program the filter table.  A pseudo-infinite
> table violates the condition of not sending the guest unwanted packets.

I'm not aware of any OS which doesn't filter this anyway, but I'm not familiar with the Windows stack.  I'd be very surprised though, since multicast filtering is often implemented as a hash on the card.

> We could also go down the path of deciding what a "big enough" table is,
> and making it part of the ABI to avoid the alloc, but then we may have
> to revise the ABI if we underestimate (whereas the current limit is an
> implementation detail).

OK, say the host bonds a NIC directly to the virtio nic for the guest.  That NIC uses a hash to filter MAC addresses.  In your scheme, you have numerous problems:
(1) we've defined filtering to be perfect, so we need to filter in the host,
(2) the guest will stop giving us filtering information after 16 entries, and
    tell us to go into promisc mode, even though we could do better.

> Weaving in your comments from other parts of this series, would it help
> at all to distinguish EINVAL from ENOMEM for the alloc call?  Maybe the
> caller could make some kind of intelligent decision based on that (ex.
> unimplemented feature vs try something smaller).

None of the currently defined calls should fail.  Feature bits should guarantee the existence of features, and failure should be greeted with horror by the guest, which should then try to limp along.

So, this conversation has convinced me that the host should accept arbitrary filtering entries, and the guest should accept that it is best effort.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Jan. 29, 2009, 12:34 a.m. UTC | #6
On Thu, Jan 29, 2009 at 10:25:46AM +1030, Rusty Russell wrote:
>
> So, this conversation has convinced me that the host should
> accept arbitrary filtering entries, and the guest should accept
> that it is best effort.

I agree with this completely.

Thanks,
David Stevens Jan. 29, 2009, 6:17 a.m. UTC | #7
I haven't been following this closely, so apologies if the point's been 
made, or
if you're talking about unicast addresses here too, but just to be clear:

For multicasting, false positives are ok, false negatives are not 
(non-functional),
and if the fixed-size address filter is exceeded, _multicast_promiscuous_
(but not all unicasts, so not promiscuous mode) is the "good" thing to do.
So "best effort" still shouldn't lead to an address you previously joined 
not
being passed because a new one is added.

Also, if you can't keep all the MAC multicast addresses (ie,
the limit is memory and not look-up speed), then getting
out of multicast-promiscuous mode correctly isn't easy
since you don't know what groups you "forgot". You could
rebuild from the protocol memberships, if you know when
you've left enough groups to fit, but otherwise the MAC
multicast addresses you didn't keep of course won't work if you
leave multicast-promiscuous mode and the filter doesn't
have them.

So, if you're talking about not being able to fit all
the address (vs. not wanting to search that many), then
I'd suggest either staying in MP mode until ifdown, or
making the join a hard failure at the limit in the first
place.

                                                        +-DLS

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rusty Russell Jan. 30, 2009, 7:03 a.m. UTC | #8
On Thursday 29 January 2009 16:47:55 David Stevens wrote:
> Also, if you can't keep all the MAC multicast addresses (ie,
> the limit is memory and not look-up speed), then getting
> out of multicast-promiscuous mode correctly isn't easy
> since you don't know what groups you "forgot". You could
> rebuild from the protocol memberships, if you know when
> you've left enough groups to fit, but otherwise the MAC
> multicast addresses you didn't keep of course won't work if you
> leave multicast-promiscuous mode and the filter doesn't
> have them.

The command is simply "set the filter", not "add" and "delete", so the host doesn't have this problem.

VLAN has more of an issue: that has add and del.  So the host can do a counting hash, or say remember 16 and then a count of overflows, but both will be suboptimal over time.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index da96368..9be0d6a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -32,6 +32,11 @@  static int csum = 1, gso = 1;
 module_param(csum, bool, 0444);
 module_param(gso, bool, 0444);
 
+static unsigned int mac_entries = 16;
+module_param(mac_entries, uint, 0444);
+MODULE_PARM_DESC(mac_entries,
+	"Number of entries in the MAC filter table.");
+
 /* FIXME: MTU in config. */
 #define MAX_PACKET_LEN (ETH_HLEN+ETH_DATA_LEN)
 #define GOOD_COPY_LEN	128
@@ -667,9 +672,64 @@  static void virtnet_set_rx_mode(struct net_device *dev)
 	if (!vi->cvq)
 		return;
 
-	promisc = ((dev->flags & IFF_PROMISC) != 0 || dev->uc_count > 0);
-	allmulti = ((dev->flags & IFF_ALLMULTI) != 0 || dev->mc_count > 0);
+	promisc = ((dev->flags & IFF_PROMISC) != 0);
+	allmulti = ((dev->flags & IFF_ALLMULTI) != 0);
+
+	if (dev->uc_count > mac_entries)
+		promisc = 1;
+	else if (dev->uc_count + dev->mc_count > mac_entries)
+		allmulti = 1;
+
+	if (!promisc && (dev->uc_count || (dev->mc_count && !allmulti))) {
+		u8 *buf, *cur;
+		int count, i;
+		struct dev_addr_list *uc_ptr, *mc_ptr;
+
+		count = dev->uc_count + (allmulti ? 0 : dev->mc_count);
+
+		buf = kzalloc(count * ETH_ALEN, GFP_ATOMIC);
+		if (!buf) {
+			printk(KERN_WARNING "%s: "
+			       "Failed to alloc MAC table, using promisc.\n", 
+			       dev->name);
+			promisc = 1;
+			goto set_mode;
+		}
+
+		cur = buf;
+		uc_ptr = dev->uc_list;
+		mc_ptr = dev->mc_list;
+
+		for (i = 0; i < dev->uc_count; i++) {
+			memcpy(cur, uc_ptr->da_addr, ETH_ALEN);
+			cur += ETH_ALEN;
+			uc_ptr = uc_ptr->next;
+		}
+		if (!allmulti) {
+			for (i = 0; i < dev->mc_count; i++) {
+				memcpy(cur, mc_ptr->da_addr, ETH_ALEN);
+				cur += ETH_ALEN;
+				mc_ptr = mc_ptr->next;
+			}
+		}
+		if (virtnet_send_command(vi, VIRTIO_NET_CTRL_MAC_TABLE,
+					 VIRTIO_NET_CTRL_MAC_TABLE_SET,
+					 buf, count * ETH_ALEN)) {
+			printk(KERN_WARNING "%s: "
+			       "Failed to set MAC filter, using promisc.\n",
+			       dev->name);
+			promisc = 1;
+		}
+		kfree(buf);
+	} else {
+		/* Set an empty MAC table - disabled */
+		if (virtnet_send_command(vi, VIRTIO_NET_CTRL_MAC_TABLE,
+				VIRTIO_NET_CTRL_MAC_TABLE_SET, NULL, 0))
+			printk(KERN_WARNING "%s: "
+			       "Failed to clear MAC filter.\n", dev->name);
+	}
 
+set_mode:
 	if (virtnet_send_command(vi, VIRTIO_NET_CTRL_RX_MODE,
 				 VIRTIO_NET_CTRL_RX_MODE_PROMISC,
 				 &promisc, sizeof(promisc)))
@@ -801,6 +861,24 @@  static int virtnet_probe(struct virtio_device *vdev)
 	vi->cvq = vdev->config->find_vq(vdev, 2, NULL);
 	if (IS_ERR(vi->cvq))
 		vi->cvq = NULL;
+	else {
+		unsigned int entries;
+
+		/*
+		 * We use a separate stack variable here because the
+		 * data segment for the static variable doesn't translate
+		 * across the virtqueue.
+		 */
+		entries = mac_entries = min(mac_entries,
+					(unsigned int)(PAGE_SIZE / ETH_ALEN));
+		if (virtnet_send_command(vi, VIRTIO_NET_CTRL_MAC_TABLE,
+					 VIRTIO_NET_CTRL_MAC_TABLE_ALLOC,
+					 &entries, sizeof(entries))) {
+			printk(KERN_WARNING "virtio-net: "
+			       "MAC filter table allocation failed.\n");
+			mac_entries = 0;
+		}
+	}
 
 	/* Initialize our empty receive and send queues. */
 	skb_queue_head_init(&vi->recv);
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index f8afef3..84086a6 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -79,4 +79,24 @@  typedef __u8 virtio_net_ctrl_ack;
  #define VIRTIO_NET_CTRL_RX_MODE_PROMISC      0
  #define VIRTIO_NET_CTRL_RX_MODE_ALLMULTI     1
 
+/*
+ * Control the MAC filter table.
+ *
+ * The ALLOC command requires a 4 byte sg entry indicating the size of
+ * the MAC filter table to be allocated in number of entries
+ * (ie. bytes = entries * ETH_ALEN).  The MAC filter table may only be
+ * allocated once after a device reset.  A device reset frees the MAC
+ * filter table, allowing a new ALLOC.  The current implementation limits
+ * the size to a single host page.
+ *
+ * The SET command requires an out sg entry containing a buffer of the
+ * entire MAC filter table.  The format is a simple byte stream
+ * concatenating all of the ETH_ALEN MAC adresses to be inserted into
+ * the table.  Partial updates are not available.  The SET command can
+ * only succeed if there is a table allocated.
+ */
+#define VIRTIO_NET_CTRL_MAC_TABLE  1
+ #define VIRTIO_NET_CTRL_MAC_TABLE_ALLOC      0
+ #define VIRTIO_NET_CTRL_MAC_TABLE_SET        1
+
 #endif /* _LINUX_VIRTIO_NET_H */