mbox

[GIT/PATCH,v3] xen network backend driver

Message ID 1298914061.5034.996.camel@zakaz.uk.xensource.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Pull-request

git://xenbits.xen.org/people/ianc/linux-2.6.git upstream/dom0/backend/netback

Message

Ian Campbell Feb. 28, 2011, 5:27 p.m. UTC
The following patch is the third iteration of the Xen network backend
driver for upstream Linux.

This driver ("netback") is the host side counterpart to the frontend
driver in drivers/net/xen-netfront.c. The PV protocol is also
implemented by frontend drivers in other OSes too, such as the BSDs and
even Windows.

Since this is the third posting I think it is time I started posting
actual pull requests. The complete patch is still appended for ease of
review.

The following changes since commit 2e820f58f7ad8eaca2f194ccdfea0de63e9c6d78:
  Ian Campbell (1):
        xen/irq: implement bind_interdomain_evtchn_to_irqhandler for backend drivers

are available in the git repository at:

  git://xenbits.xen.org/people/ianc/linux-2.6.git upstream/dom0/backend/netback

Bastian Blank (1):
      xen: netback: Fix null-pointer access in netback_uevent

Christophe Saout (1):
      xen: netback: use dev_name() instead of removed ->bus_id.

Dongxiao Xu (5):
      xen: netback: Move global/static variables into struct xen_netbk.
      xen: netback: Introduce a new struct type page_ext.
      xen: netback: Multiple tasklets support.
      xen: netback: Use Kernel thread to replace the tasklet.
      xen: netback: Set allocated memory to zero from vmalloc.

Ian Campbell (58):
      xen: netback: Initial import of linux-2.6.18-xen.hg netback driver.
      xen: netback: first cut at porting to upstream and cleaning up
      xen: netback: add ethtool stat to track copied skbs.
      xen: netback: make queue length parameter writeable in sysfs
      xen: netback: parent sysfs device should be set before registering.
      xen: rename netbk module xen-netback.
      xen: netback: remove unused xen_network_done code
      xen: netback: factor disconnect from backend into new function.
      xen: netback: wait for hotplug scripts to complete before signalling connected to frontend
      xen: netback: Always pull through PKT_PROT_LEN bytes into the linear part of an skb.
      xen: netback: Allow setting of large MTU before rings have connected.
      xen: netback: correctly setup skb->ip_summed on receive
      xen: netback: handle NET_SKBUFF_DATA_USES_OFFSET correctly
      xen: netback: drop frag member from struct netbk_rx_meta
      xen: netback: linearise SKBs as we copy them into guest memory on guest-RX.
      xen: netback: drop more relics of flipping mode
      xen: netback: check if foreign pages are actually netback-created foreign pages.
      xen: netback: do not unleash netback threads until initialisation is complete
      xen: netback: save interrupt state in add_to_net_schedule_list_tail
      xen: netback: increase size of rx_meta array.
      xen: netback: take net_schedule_list_lock when removing entry from net_schedule_list
      xen: netback: Drop GSO SKBs which do not have csum_blank.
      xen: netback: completely remove tx_queue_timer
      Revert "xen: netback: Drop GSO SKBs which do not have csum_blank."
      xen: netback: handle incoming GSO SKBs which are not CHECKSUM_PARTIAL
      xen: netback: rationalise types used in count_skb_slots
      xen: netback: refactor logic for moving to a new receive buffer.
      xen: netback: refactor code to get next rx buffer into own function.
      xen: netback: simplify use of netbk_add_frag_responses
      xen: netback: cleanup coding style
      xen: netback: drop private ?PRINTK macros in favour of pr_*
      xen: netback: move under drivers/net/xen-netback/
      xen: netback: remove queue_length module option
      xen: netback: correct error return from ethtool hooks.
      xen: netback: avoid leading _ in function parameter names.
      xen: netback: drop unused debug interrupt handler.
      xen: netif: properly namespace the Xen netif protocol header.
      xen: netif: improve Kconfig help text for front- and backend drivers.
      xen: netback: drop ethtool drvinfo callback
      xen: netback: use xen_netbk prefix where appropriate
      xen: netback: refactor to make all xen_netbk knowledge internal to netback.c
      xen: netback: use xenvif_ prefix where appropriate
      xen: netback: add reference from xenvif to xen_netbk
      xen: netback: refactor to separate network device from worker pools
      xen: netback: switch to kthread mode and drop tasklet mode
      xen: netback: handle frames whose head crosses a page boundary
      xen: netback: return correct values from start_xmit
      xen: netback: remove useless memset to zero.
      xen: netback: use register_netdev()
      xen: netback: simplify unwinding netback_init's work on failure.
      xen: netback: use core network carrier flag.
      xen: netback: s/xenvif_queue_full/xenvif_rx_queue_full/
      xen: netback: add xenvif_rx_schedulable
      xen: netback: further separate xen_netbk and xenvif
      xen: netback: use netdev_LEVEL instead of pr_LEVEL
      xen: netback: drop rx_notify and notify_list array in favour of a normal list
      xen: netback: Make dependency on PageForeign conditional
      xen: netback: completely drop foreign page support

James Harper (1):
      xen: netback: avoid null-pointer access in netback_uevent

Jan Beulich (1):
      xen: netback: unmap tx ring gref when mapping of rx ring gref failed

Jeremy Fitzhardinge (21):
      xen: netback: don't include xen/evtchn.h
      xen: netback: use mod_timer
      xen: netback: use NET_SKB_PAD rather than "16"
      xen: netback: completely drop flip support
      xen: netback: demacro MASK_PEND_IDX
      xen: netback: convert PEND_RING_IDX into a proper typedef name
      xen: netback: rename NR_PENDING_REQS to nr_pending_reqs()
      xen: netback: pre-initialize list and spinlocks; use empty list to indicate not on list
      xen: netback: remove CONFIG_XEN_NETDEV_PIPELINED_TRANSMITTER
      xen: netback: make netif_get/put inlines
      xen: netback: move code around
      xen: netback: document PKT_PROT_LEN
      xen: netback: convert to net_device_ops
      xen: netback: reinstate missing code
      xen: netback: remove debug noise
      xen: netback: don't screw around with packet gso state
      xen: netback: use dev_get/set_drvdata() inteface
      xen: netback: include linux/sched.h for TASK_* definitions
      xen: netback: use get_sset_count rather than obsolete get_stats_count
      xen: netback: minor code formatting fixup
      xen: netback: only initialize for PV domains

Keir Fraser (1):
      xen: netback: Fixes for delayed copy of tx network packets.

Konrad Rzeszutek Wilk (1):
      Fix compile warnings: ignoring return value of 'xenbus_register_backend' ..

Paul Durrant (8):
      xen: netback: Fix basic indentation issue
      xen: netback: Add a new style of passing GSO packets to frontends.
      xen: netback: Make frontend features distinct from netback feature flags.
      xen: netback: Re-define PKT_PROT_LEN to be bigger.
      xen: netback: Don't count packets we don't actually receive.
      xen: netback: Remove the 500ms timeout to restart the netif queue.
      xen: netback: Add a missing test to tx_work_todo.
      xen: netback: Re-factor net_tx_action_dealloc() slightly.

Steven Smith (2):
      xen: netback: make sure that pg->mapping is never NULL for a page mapped from a foreign domain.
      xen: netback: try to pull a minimum of 72 bytes into the skb data area

 drivers/net/Kconfig                 |   38 +-
 drivers/net/Makefile                |    1 +
 drivers/net/xen-netback/Makefile    |    3 +
 drivers/net/xen-netback/common.h    |  162 ++++
 drivers/net/xen-netback/interface.c |  424 +++++++++
 drivers/net/xen-netback/netback.c   | 1745 +++++++++++++++++++++++++++++++++++
 drivers/net/xen-netback/xenbus.c    |  490 ++++++++++
 drivers/net/xen-netfront.c          |   20 +-
 include/xen/interface/io/netif.h    |   80 +-
 9 files changed, 2909 insertions(+), 54 deletions(-)
 create mode 100644 drivers/net/xen-netback/Makefile
 create mode 100644 drivers/net/xen-netback/common.h
 create mode 100644 drivers/net/xen-netback/interface.c
 create mode 100644 drivers/net/xen-netback/netback.c
 create mode 100644 drivers/net/xen-netback/xenbus.c

Changes since the second posting, mostly due to Francois Romieu and
Konrad Rzeszutek Wilk's review, include:
      * Rebased ontop of 2.6.38-rc2 (my branch with the generic xenbus
        backend support got rebased somewhere on its journey into
        mainline 2.6.38-rc1 and so this rebase was needed to resolve
        that). The rebase also included moving the patches on top of the
        last precursor patch (addition of
        bind_interdomain_evtchn_to_irqhandler, which is in linux-next).
      * Dropped receive notify arrays in favour of simple lists.
      * Dropped private carrier flag. It's not clear that the reasons
        for adding this, which may well have been valid in 2.6.18, are
        still valid today. We can revisit in the future if and when
        required.
      * A slight refactoring to more completely encapsulate driver stuff
        in interface.c and backend worker pool stuff in netback.c
      * Use netdev_XXX instead of pr_XXX where appropriate
      * Various coding style etc related tweaks suggested during review.

Changes since the first posting, many due to Ben Hutching's review,
include:

      * Improved Kconfig description for XEN_NETDEV_BACKEND and
        XEN_NETDEV_FRONTEND.
      * Avoid the core networking namespaces (skb_*, netif_*, net_*).
        This led to a major refactoring since the current namespace use
        was something of a mess. Now the code tries to consistently use
        xenvif* for the device driver related stuff (interface.c) and
        xen_netbk* for the backend worker pool related stuff
        (netback.c). This cleanup extended to the
        xen/interface/io/netif.h header which required changes to
        netfront too.
      * Dropped the tasklet mode for the backend worker leaving only the
        kthread mode. I will revisit the suggestion to use NAPI on the
        driver side in the future, I think it's somewhat orthogonal to
        the use of kthread here, but it seems likely to be a worthwhile
        improvement either way.
      * Dropped netbk_copy_skb. Ben requested this function be made
        generic and moved to the networking core but it turns out it was
        trivial to remove netback's reliance on this functionality, and
        avoid a bunch of unnecessary copying in the process. The
        function's semantics were a bit odd in any case so I couldn't
        imagine many other users.
      * Handle incoming GSO SKBs which are not CHECKSUM_PARTIAL
        correctly. Changed from previous behaviour (dropping the skb) to
        doing a fixup after discussion of equivalent frontend patch
        which became e0ce4af920eb028f38bfd680b1d733f4c7a0b7cf.
      * Other improvements suggested by Ben (e.g. dropping pointless
        filename references from top of file comments, not including
        version.h, correct return values from ethtool hooks, dropped
        queue_length module parameter, dropped unused debug interrupt,
        etc)

Changes made for the initial upstream post of the driver vs. the out of
tree xen.git pvops version include:

      * The driver has been put through the checkpatch.pl wringer plus
        several manual cleanup passes.
      * Moved from drivers/xen/netback to drivers/net/xen-netback.
      * Most significantly the guest transmit path (i.e. what looks like
        receive to netback) has been significantly reworked to remove
        the dependency on the out of tree PageForeign page flag (a core
        kernel patch which enables a per page destructor callback on the
        final put_page). This page flag is needed in order to implement
        a grant map based transmit path (where guest pages are mapped
        directly into SKB frags). Instead this version of netback uses
        grant copy operations into regular memory belonging to the
        backend domain. Reinstating the grant map functionality is
        something which I would like to revisit in the future.

Ian.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Ben Hutchings Feb. 28, 2011, 6:53 p.m. UTC | #1
On Mon, 2011-02-28 at 17:27 +0000, Ian Campbell wrote:
> The following patch is the third iteration of the Xen network backend
> driver for upstream Linux.
> 
> This driver ("netback") is the host side counterpart to the frontend
> driver in drivers/net/xen-netfront.c. The PV protocol is also
> implemented by frontend drivers in other OSes too, such as the BSDs and
> even Windows.
> 
> Since this is the third posting I think it is time I started posting
> actual pull requests. The complete patch is still appended for ease of
> review.
[...]
> --- /dev/null
> +++ b/drivers/net/xen-netback/common.h
[...]
> +	/* Statistics */
> +	int rx_gso_checksum_fixup;

This should be defined as unsigned long (ideally it would be u64, but
that can't be updated atomically on 32-bit systems).

[...]
> --- /dev/null
> +++ b/drivers/net/xen-netback/interface.c
[...]
> +void xenvif_receive_skb(struct xenvif *vif, struct sk_buff *skb)
> +{
> +	netif_rx_ni(skb);
> +	vif->dev->last_rx = jiffies;
> +}

Don't update last_rx; it's only needed on slave devices of a bond, and
the bonding driver takes care of it now.

[...]
> +static int xenvif_change_mtu(struct net_device *dev, int mtu)
> +{
> +	struct xenvif *vif = netdev_priv(dev);
> +	int max = vif->can_sg ? 65535 - ETH_HLEN : ETH_DATA_LEN;
> +	if (mtu > max)
> +		return -EINVAL;
> +	dev->mtu = mtu;
> +	return 0;
> +}
[...]

Since any VLAN tag must be inserted inline, shouldn't the MTU limit be
65535 - VLAN_ETH_HLEN?

Ben.
Ian Campbell March 1, 2011, 10:06 a.m. UTC | #2
On Mon, 2011-02-28 at 18:53 +0000, Ben Hutchings wrote:
> On Mon, 2011-02-28 at 17:27 +0000, Ian Campbell wrote:

> This should be defined as unsigned long (ideally it would be u64, but
> that can't be updated atomically on 32-bit systems).
[...]
> Don't update last_rx; it's only needed on slave devices of a bond, and
> the bonding driver takes care of it now.

I made these two changes.

> [...]
> > +static int xenvif_change_mtu(struct net_device *dev, int mtu)
> > +{
> > +	struct xenvif *vif = netdev_priv(dev);
> > +	int max = vif->can_sg ? 65535 - ETH_HLEN : ETH_DATA_LEN;
> > +	if (mtu > max)
> > +		return -EINVAL;
> > +	dev->mtu = mtu;
> > +	return 0;
> > +}
> [...]
> 
> Since any VLAN tag must be inserted inline, shouldn't the MTU limit be
> 65535 - VLAN_ETH_HLEN?

In that case shouldn't the other case also be ETH_FRAME_LEN -
VLAN_ETH_HLEN?

I'm not sure what is customary wrt MTU vs VLAN (or other) overheads
under Linux, do we take the hit of the overhead for every device
regardless of VLAN being configured or not or do we expect that people
will configure the MTU as necessary when they configure a VLAN?

Netback itself will cope fine with either MTU, it's the external
connectivity which will actually matter. e.g. the usual configuration
would be (where vifX.Y represents potentially multiple netback devices):

	eth0 <-> eth0.VLAN <-> br0.VLAN <=> vifX.Y

So ultimately it will be the eth0 hardware/driver which matters.

There is a comment in net/8021q/vlan.c which says 
        /* need 4 bytes for extra VLAN header info,
         * hope the underlying device can handle it.
         */
and propagates the underlying device's MTU unmodified so it seems that
the norm is to leave the MTU at maximum assuming no VLAN overhead and
defer any required tweaking to the admin?

Alternatively you might have the VLAN on the eth0 device inside the
guest (e.g. netback<->netfront acts like a trunk link) in which case
basically the same argument applies?

I don't really mind either way so I'm happy to follow whatever the
convention is.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings March 1, 2011, 12:28 p.m. UTC | #3
On Tue, 2011-03-01 at 10:06 +0000, Ian Campbell wrote:
> On Mon, 2011-02-28 at 18:53 +0000, Ben Hutchings wrote:
> > On Mon, 2011-02-28 at 17:27 +0000, Ian Campbell wrote:
> 
> > This should be defined as unsigned long (ideally it would be u64, but
> > that can't be updated atomically on 32-bit systems).
> [...]
> > Don't update last_rx; it's only needed on slave devices of a bond, and
> > the bonding driver takes care of it now.
> 
> I made these two changes.
> 
> > [...]
> > > +static int xenvif_change_mtu(struct net_device *dev, int mtu)
> > > +{
> > > +	struct xenvif *vif = netdev_priv(dev);
> > > +	int max = vif->can_sg ? 65535 - ETH_HLEN : ETH_DATA_LEN;
> > > +	if (mtu > max)
> > > +		return -EINVAL;
> > > +	dev->mtu = mtu;
> > > +	return 0;
> > > +}
> > [...]
> > 
> > Since any VLAN tag must be inserted inline, shouldn't the MTU limit be
> > 65535 - VLAN_ETH_HLEN?
> 
> In that case shouldn't the other case also be ETH_FRAME_LEN -
> VLAN_ETH_HLEN?
[...]

IEEE 802.3, in its infinite wisdom, says that the maximum frame length
is 1514, except that when an 802.1q tag is present it is 1518.  So the
MTU for standard Ethernet remains 1500, regardless of the use of VLANs.

Ben.