diff mbox

[RFC,v2] net: add PCINet driver

Message ID 20081029202027.GH12879@ovro.caltech.edu (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Ira Snyder Oct. 29, 2008, 8:20 p.m. UTC
This adds support to Linux for a virtual ethernet interface which uses the
PCI bus as its transport mechanism. It creates a simple, familiar, and fast
method of communication for two devices connected by a PCI interface.

I have implemented client support for the Freescale MPC8349EMDS board,
which is capable of running in PCI Agent mode (It acts like a PCI card, but
is a complete PowerPC computer, running Linux). It is almost certainly
trivially ported to any MPC83xx system. It should be a relatively small
effort to port it to any chip that can generate PCI interrupts and has at
least one PCI accessible scratch register.

It was developed to work in a CompactPCI crate of computers, one of which
is a relatively standard x86 system (acting as the host) and many PowerPC
systems (acting as clients).

RFC v1 -> RFC v2:
  * remove vim modelines
  * use net_device->name in request_irq(), for irqbalance
  * remove unnecessary wqt_get_stats(), use default get_stats() instead
  * use dev_printk() and friends
  * add message unit to MPC8349EMDS dts file

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
---

This is the third posting of this driver. I got some feedback, and have
corrected the problems. Stephen, thanks for the review! I also got some
off-list feedback from a potential user, so I believe this is worth getting
into mainline.

I'll post up a revised version about once a week as long as the changes are
minor. If they are more substantial, I'll post them as needed.

GregKH: is this worth putting into the staging tree? (I left you out of the
CC list to keep your email traffic down)

The remaining issues I see in this driver:
1) ==== Naming ====
   The name wqt originally stood for "workqueue-test" and somewhat evolved
   over time into the current driver. I'm looking for suggestions for a
   better name. It should be the same between the host and client drivers,
   to make porting the code between them easier. The drivers are /very/
   similar other than the setup code.
2) ==== IMMR Usage ====
   In the Freescale client driver, I use the whole set of board control
   registers (AKA IMMR registers). I only need a very small subset of them,
   during startup to set up the DMA window. I used the full set of
   registers so that I could share the register offsets between the drivers
   (in pcinet_hw.h)
3) ==== ioremap() of a physical address ====
   In the Freescale client driver, I called ioremap() with the physical
   address of the IMMR registers, 0xe0000000. I don't know a better way to
   get them. They are somewhat exposed in the Flat Device Tree. Suggestions
   on how to extract them are welcome.
4) ==== Hardcoded DMA Window Address ====
   In the Freescale client driver, I just hardcoded the address of the
   outbound PCI window into the DMA transfer code. It is 0x80000000.
   Suggestions on how to get this value at runtime are welcome.


Rationale behind some decisions:
1) ==== Usage of the PCINET_NET_REGISTERS_VALID bit ====
   I want to be able to use this driver from U-Boot to tftp a kernel over
   the PCI backplane, and then boot up the board. This means that the
   device descriptor memory, which lives in the client RAM, becomes invalid
   during boot.
2) ==== Buffer Descriptors in client memory ====
   I chose to put the buffer descriptors in client memory rather than host
   memory. It seemed more logical to me at the time. In my application,
   I'll have 19 boards + 1 host per cPCI chassis. The client -> host
   direction will see most of the traffic, and so I thought I would cut
   down on the number of PCI accesses needed. I'm willing to change this.
3) ==== Usage of client DMA controller for all data transfer ====
   This was done purely for speed. I tried using the CPU to transfer all
   data, and it is very slow: ~3MB/sec. Using the DMA controller gets me to
   ~40MB/sec (as tested with netperf).
4) ==== Static 1GB DMA window ====
   Maintaining a window while DMA's in flight, and then changing it seemed
   too complicated. Also, testing showed that using a static window gave me
   a ~10MB/sec speedup compared to moving the window for each skb.
5) ==== The serial driver ====
   Yes, there are two essentially separate drivers here. I needed a method
   to communicate with the U-Boot bootloader on these boards without
   plugging in a serial cable. With 19 clients + 1 host per chassis, the
   cable clutter is worth avoiding. Since everything is connected via the
   PCI bus anyway, I used that. A virtual serial port was simple to
   implement using the messaging unit hardware that I used for the network
   driver.

I'll post both U-Boot drivers to their mailing list once this driver is
finalized.

Thanks,
Ira


 arch/powerpc/boot/dts/mpc834x_mds.dts |    7 +
 drivers/net/Kconfig                   |   34 +
 drivers/net/Makefile                  |    3 +
 drivers/net/pcinet.h                  |   75 ++
 drivers/net/pcinet_fsl.c              | 1351 ++++++++++++++++++++++++++++++++
 drivers/net/pcinet_host.c             | 1383 +++++++++++++++++++++++++++++++++
 drivers/net/pcinet_hw.h               |   77 ++
 7 files changed, 2930 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/pcinet.h
 create mode 100644 drivers/net/pcinet_fsl.c
 create mode 100644 drivers/net/pcinet_host.c
 create mode 100644 drivers/net/pcinet_hw.h

Comments

stephen hemminger Oct. 29, 2008, 8:25 p.m. UTC | #1
On Wed, 29 Oct 2008 13:20:27 -0700
Ira Snyder <iws@ovro.caltech.edu> wrote:

> This adds support to Linux for a virtual ethernet interface which uses the
> PCI bus as its transport mechanism. It creates a simple, familiar, and fast
> method of communication for two devices connected by a PCI interface.
> 
> I have implemented client support for the Freescale MPC8349EMDS board,
> which is capable of running in PCI Agent mode (It acts like a PCI card, but
> is a complete PowerPC computer, running Linux). It is almost certainly
> trivially ported to any MPC83xx system. It should be a relatively small
> effort to port it to any chip that can generate PCI interrupts and has at
> least one PCI accessible scratch register.
> 
> It was developed to work in a CompactPCI crate of computers, one of which
> is a relatively standard x86 system (acting as the host) and many PowerPC
> systems (acting as clients).
> 
> RFC v1 -> RFC v2:
>   * remove vim modelines
>   * use net_device->name in request_irq(), for irqbalance
>   * remove unnecessary wqt_get_stats(), use default get_stats() instead
>   * use dev_printk() and friends
>   * add message unit to MPC8349EMDS dts file
> 
> Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
> ---
> 
> This is the third posting of this driver. I got some feedback, and have
> corrected the problems. Stephen, thanks for the review! I also got some
> off-list feedback from a potential user, so I believe this is worth getting
> into mainline.
> 
> I'll post up a revised version about once a week as long as the changes are
> minor. If they are more substantial, I'll post them as needed.
> 
> GregKH: is this worth putting into the staging tree? (I left you out of the
> CC list to keep your email traffic down)
> 
> The remaining issues I see in this driver:
> 1) ==== Naming ====
>    The name wqt originally stood for "workqueue-test" and somewhat evolved
>    over time into the current driver. I'm looking for suggestions for a
>    better name. It should be the same between the host and client drivers,
>    to make porting the code between them easier. The drivers are /very/
>    similar other than the setup code.
> 2) ==== IMMR Usage ====
>    In the Freescale client driver, I use the whole set of board control
>    registers (AKA IMMR registers). I only need a very small subset of them,
>    during startup to set up the DMA window. I used the full set of
>    registers so that I could share the register offsets between the drivers
>    (in pcinet_hw.h)
> 3) ==== ioremap() of a physical address ====
>    In the Freescale client driver, I called ioremap() with the physical
>    address of the IMMR registers, 0xe0000000. I don't know a better way to
>    get them. They are somewhat exposed in the Flat Device Tree. Suggestions
>    on how to extract them are welcome.
> 4) ==== Hardcoded DMA Window Address ====
>    In the Freescale client driver, I just hardcoded the address of the
>    outbound PCI window into the DMA transfer code. It is 0x80000000.
>    Suggestions on how to get this value at runtime are welcome.
> 
> 
> Rationale behind some decisions:
> 1) ==== Usage of the PCINET_NET_REGISTERS_VALID bit ====
>    I want to be able to use this driver from U-Boot to tftp a kernel over
>    the PCI backplane, and then boot up the board. This means that the
>    device descriptor memory, which lives in the client RAM, becomes invalid
>    during boot.
> 2) ==== Buffer Descriptors in client memory ====
>    I chose to put the buffer descriptors in client memory rather than host
>    memory. It seemed more logical to me at the time. In my application,
>    I'll have 19 boards + 1 host per cPCI chassis. The client -> host
>    direction will see most of the traffic, and so I thought I would cut
>    down on the number of PCI accesses needed. I'm willing to change this.
> 3) ==== Usage of client DMA controller for all data transfer ====
>    This was done purely for speed. I tried using the CPU to transfer all
>    data, and it is very slow: ~3MB/sec. Using the DMA controller gets me to
>    ~40MB/sec (as tested with netperf).
> 4) ==== Static 1GB DMA window ====
>    Maintaining a window while DMA's in flight, and then changing it seemed
>    too complicated. Also, testing showed that using a static window gave me
>    a ~10MB/sec speedup compared to moving the window for each skb.
> 5) ==== The serial driver ====
>    Yes, there are two essentially separate drivers here. I needed a method
>    to communicate with the U-Boot bootloader on these boards without
>    plugging in a serial cable. With 19 clients + 1 host per chassis, the
>    cable clutter is worth avoiding. Since everything is connected via the
>    PCI bus anyway, I used that. A virtual serial port was simple to
>    implement using the messaging unit hardware that I used for the network
>    driver.
> 
> I'll post both U-Boot drivers to their mailing list once this driver is
> finalized.
> 
> Thanks,
> Ira
> 
> 
>  arch/powerpc/boot/dts/mpc834x_mds.dts |    7 +
>  drivers/net/Kconfig                   |   34 +
>  drivers/net/Makefile                  |    3 +
>  drivers/net/pcinet.h                  |   75 ++
>  drivers/net/pcinet_fsl.c              | 1351 ++++++++++++++++++++++++++++++++
>  drivers/net/pcinet_host.c             | 1383 +++++++++++++++++++++++++++++++++
>  drivers/net/pcinet_hw.h               |   77 ++
>  7 files changed, 2930 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/net/pcinet.h
>  create mode 100644 drivers/net/pcinet_fsl.c
>  create mode 100644 drivers/net/pcinet_host.c
>  create mode 100644 drivers/net/pcinet_hw.h
> 
> diff --git a/arch/powerpc/boot/dts/mpc834x_mds.dts b/arch/powerpc/boot/dts/mpc834x_mds.dts
> index c986c54..3bc8975 100644
> --- a/arch/powerpc/boot/dts/mpc834x_mds.dts
> +++ b/arch/powerpc/boot/dts/mpc834x_mds.dts
> @@ -104,6 +104,13 @@
>  			mode = "cpu";
>  		};
>  
> +		message-unit@8030 {
> +			compatible = "fsl,mpc8349-mu";
> +			reg = <0x8030 0xd0>;
> +			interrupts = <69 0x8>;
> +			interrupt-parent = <&ipic>;
> +		};
> +
>  		dma@82a8 {
>  			#address-cells = <1>;
>  			#size-cells = <1>;
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index f749b40..eef7af7 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -2279,6 +2279,40 @@ config UGETH_TX_ON_DEMAND
>  	bool "Transmit on Demand support"
>  	depends on UCC_GETH
>  
> +config PCINET_FSL
> +	tristate "PCINet Virtual Ethernet over PCI support (Freescale)"
> +	depends on MPC834x_MDS && !PCI
> +	select DMA_ENGINE
> +	select FSL_DMA
> +	help
> +	  When running as a PCI Agent, this driver will create a virtual
> +	  ethernet link running over the PCI bus, allowing simplified
> +	  communication with the host system. The host system will need
> +	  to use the corresponding driver.
> +
> +	  If in doubt, say N.
> +
> +config PCINET_HOST
> +	tristate "PCINet Virtual Ethernet over PCI support (Host)"
> +	depends on PCI
> +	help
> +	  This driver will let you communicate with a PCINet client device
> +	  using a virtual ethernet link running over the PCI bus. This
> +	  allows simplified communication with the client system.
> +
> +	  This is inteded for use in a system that has a crate full of
> +	  computers running Linux, all connected by a PCI backplane.
> +
> +	  If in doubt, say N.
> +
> +config PCINET_DISABLE_CHECKSUM
> +	bool "Disable packet checksumming"
> +	depends on PCINET_FSL || PCINET_HOST
> +	default n
> +	help
> +	  Disable packet checksumming on packets received by the PCINet
> +	  driver. This gives a possible speed boost.
> +
>  config MV643XX_ETH
>  	tristate "Marvell Discovery (643XX) and Orion ethernet support"
>  	depends on MV64360 || MV64X60 || (PPC_MULTIPLATFORM && PPC32) || PLAT_ORION
> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index f19acf8..c6fbafc 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -30,6 +30,9 @@ gianfar_driver-objs := gianfar.o \
>  obj-$(CONFIG_UCC_GETH) += ucc_geth_driver.o
>  ucc_geth_driver-objs := ucc_geth.o ucc_geth_mii.o ucc_geth_ethtool.o
>  
> +obj-$(CONFIG_PCINET_FSL) += pcinet_fsl.o
> +obj-$(CONFIG_PCINET_HOST) += pcinet_host.o
> +
>  #
>  # link order important here
>  #
> diff --git a/drivers/net/pcinet.h b/drivers/net/pcinet.h
> new file mode 100644
> index 0000000..66d2cba
> --- /dev/null
> +++ b/drivers/net/pcinet.h
> @@ -0,0 +1,75 @@
> +/*
> + * Shared Definitions for the PCINet / PCISerial drivers
> + *
> + * Copyright (c) 2008 Ira W. Snyder <iws@ovro.caltech.edu>
> + *
> + * Heavily inspired by the drivers/net/fs_enet driver
> + *
> + * This file is licensed under the terms of the GNU General Public License
> + * version 2. This program is licensed "as is" without any warranty of any
> + * kind, whether express or implied.
> + */
> +
> +#ifndef PCINET_H
> +#define PCINET_H
> +
> +#include <linux/kernel.h>
> +#include <linux/if_ether.h>
> +
> +/* Ring and Frame size -- these must match between the drivers */
> +#define PH_RING_SIZE	(64)
> +#define PH_MAX_FRSIZE	(64 * 1024)
> +#define PH_MAX_MTU	(PH_MAX_FRSIZE - ETH_HLEN)
> +
> +struct circ_buf_desc {
> +	__le32 sc;
> +	__le32 len;
> +	__le32 addr;
> +} __attribute__((__packed__));
> +typedef struct circ_buf_desc cbd_t;

Don't use typedef. See chapter 5 of Documentation/CodingStyle

Do you really need packed here, sometime gcc generates much worse code
on needlessly packed structures?
Ira Snyder Oct. 29, 2008, 8:50 p.m. UTC | #2
On Wed, Oct 29, 2008 at 01:25:06PM -0700, Stephen Hemminger wrote:
> On Wed, 29 Oct 2008 13:20:27 -0700
> Ira Snyder <iws@ovro.caltech.edu> wrote:

Big snip...

> > diff --git a/drivers/net/pcinet.h b/drivers/net/pcinet.h
> > new file mode 100644
> > index 0000000..66d2cba
> > --- /dev/null
> > +++ b/drivers/net/pcinet.h
> > @@ -0,0 +1,75 @@
> > +/*
> > + * Shared Definitions for the PCINet / PCISerial drivers
> > + *
> > + * Copyright (c) 2008 Ira W. Snyder <iws@ovro.caltech.edu>
> > + *
> > + * Heavily inspired by the drivers/net/fs_enet driver
> > + *
> > + * This file is licensed under the terms of the GNU General Public License
> > + * version 2. This program is licensed "as is" without any warranty of any
> > + * kind, whether express or implied.
> > + */
> > +
> > +#ifndef PCINET_H
> > +#define PCINET_H
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/if_ether.h>
> > +
> > +/* Ring and Frame size -- these must match between the drivers */
> > +#define PH_RING_SIZE	(64)
> > +#define PH_MAX_FRSIZE	(64 * 1024)
> > +#define PH_MAX_MTU	(PH_MAX_FRSIZE - ETH_HLEN)
> > +
> > +struct circ_buf_desc {
> > +	__le32 sc;
> > +	__le32 len;
> > +	__le32 addr;
> > +} __attribute__((__packed__));
> > +typedef struct circ_buf_desc cbd_t;
> 
> Don't use typedef. See chapter 5 of Documentation/CodingStyle
> 

I know about this. I was following the example set forth in
drivers/net/fs_enet. I tried to make this driver somewhat similar to
that driver, but I'm not against changing this to a struct everywhere.

> Do you really need packed here, sometime gcc generates much worse code
> on needlessly packed structures?

I'm pretty sure I do. I share this structure between both devices over
the PCI bus. This is where the physical addresses of the skb's go so
that the DMA controller can transfer them.

As an example, let's say I have a machine that places 32bit values on
64bit boundaries. AFAIK, the Freescale places 32bit values on 32bit
boundaries. Now the two of them disagree about where the fields of the
buffer descriptor are.

Of course, I may be wrong. :) Comments?

Thanks for the review,
Ira
Scott Wood Oct. 29, 2008, 8:54 p.m. UTC | #3
Ira Snyder wrote:
> I know about this. I was following the example set forth in
> drivers/net/fs_enet. 

I recommend against that. :-)

-Scott
Ira Snyder Oct. 29, 2008, 9:13 p.m. UTC | #4
On Wed, Oct 29, 2008 at 03:54:46PM -0500, Scott Wood wrote:
> Ira Snyder wrote:
>> I know about this. I was following the example set forth in
>> drivers/net/fs_enet. 
>
> I recommend against that. :-)

Great, now you tell me :)

I'll go ahead change the typedef to a struct. Hopefully some of the
other inspiration I took wasn't too bad. I'm totally open to suggestions
on improvements for this driver.

Ira
Matt Sealey Oct. 29, 2008, 9:43 p.m. UTC | #5
Ira Snyder wrote:
> 
> I'll go ahead change the typedef to a struct. Hopefully some of the
> other inspiration I took wasn't too bad. I'm totally open to suggestions
> on improvements for this driver.

Not a suggestion for improvement, but I have to say this is by far the
single coolest thing I have EVER seen submitted to mainline :)
Arnd Bergmann Nov. 4, 2008, 12:09 p.m. UTC | #6
On Wednesday 29 October 2008, Ira Snyder wrote:
> This adds support to Linux for a virtual ethernet interface which uses the
> PCI bus as its transport mechanism. It creates a simple, familiar, and fast
> method of communication for two devices connected by a PCI interface.

Very interesting. I have seen multiple such drivers, and still plan to
have one as well for the cell/axon platform.

> This is the third posting of this driver. I got some feedback, and have
> corrected the problems. Stephen, thanks for the review! I also got some
> off-list feedback from a potential user, so I believe this is worth getting
> into mainline.

Sorry for not having replied earlier, I did not see first postings.
While I did not see any major problems with your code, I think we should
actually do it in a different way, which I have talked about at the
Linux plumbers conference, and will again at the UKUUG Linux conference
this weekend. You can find my slides from the previous talk at
http://userweb.kernel.org/%7Earnd/papers/plumbers08/plumbers-slides.pdf
 
Since you have code and I only have plans, I suppose I can't make you
wait for me to get a driver for your platform, but I hope we can join
forces in the future and share code in the future.

My fundamental criticism of your code is that it is not flexible enough.
We have a number of platforms that act as a PCI, PCIe or similar kind
of device and want to transfer network data. At the same time, some of
these want to use the same infrastructure for non-network data like
MPI (ibverbs), block device or file system (9p). Even in your limited
setup, it seems that you should be able to share more code between the
two implementations you have posted here. When looking at the code, I
found a lot of duplication between the drivers that you should be able
to avoid.

My suggestion is to base this on the virtio infrastructure, and some
of my colleagues at IBM already implemented a driver that uses
virtio-net on a PCIe connection.

	Arnd <><

Now a few comments on your code:


> diff --git a/arch/powerpc/boot/dts/mpc834x_mds.dts b/arch/powerpc/boot/dts/mpc834x_mds.dts
> index c986c54..3bc8975 100644
> --- a/arch/powerpc/boot/dts/mpc834x_mds.dts
> +++ b/arch/powerpc/boot/dts/mpc834x_mds.dts
> @@ -104,6 +104,13 @@
>  			mode = "cpu";
>  		};
>  
> +		message-unit@8030 {
> +			compatible = "fsl,mpc8349-mu";
> +			reg = <0x8030 0xd0>;
> +			interrupts = <69 0x8>;
> +			interrupt-parent = <&ipic>;
> +		};

What is a message unit? Is this the mailbox you use in the driver?
We are facing a similar problem on Axon regarding probing of
the virtual device because you can't easily tell from the device
tree whether the machine is connected to something that is trying
to communicate.

The message-unit does not seem to be network-specific, so it's not
particularly nice to have the network unit grab that of_device.

> +config PCINET_FSL
> +	tristate "PCINet Virtual Ethernet over PCI support (Freescale)"
> +	depends on MPC834x_MDS && !PCI
> +	select DMA_ENGINE
> +	select FSL_DMA
> +	help
> +	  When running as a PCI Agent, this driver will create a virtual
> +	  ethernet link running over the PCI bus, allowing simplified
> +	  communication with the host system. The host system will need
> +	  to use the corresponding driver.
> +
> +	  If in doubt, say N.

Why 'depends on !PCI'? This means that you cannot build a kernel that
is able to run both as host and endpoint for PCInet, right?

> +config PCINET_HOST
> +	tristate "PCINet Virtual Ethernet over PCI support (Host)"
> +	depends on PCI
> +	help
> +	  This driver will let you communicate with a PCINet client device
> +	  using a virtual ethernet link running over the PCI bus. This
> +	  allows simplified communication with the client system.
> +
> +	  This is inteded for use in a system that has a crate full of
> +	  computers running Linux, all connected by a PCI backplane.
> +
> +	  If in doubt, say N.

Is this meant to be portable across different hardware implementations?
If the driver can only work with PCINET_FSL on the other side, you
should probably mention that in the description.

> +config PCINET_DISABLE_CHECKSUM
> +	bool "Disable packet checksumming"
> +	depends on PCINET_FSL || PCINET_HOST
> +	default n
> +	help
> +	  Disable packet checksumming on packets received by the PCINet
> +	  driver. This gives a possible speed boost.

Why make this one optional? Is there ever a reason to enable checksumming?
If there is, how about making it tunable with ethtool?

> +struct circ_buf_desc {
> +	__le32 sc;
> +	__le32 len;
> +	__le32 addr;
> +} __attribute__((__packed__));

It would be useful to always force aligning the desciptors to the whole
32 bit and avoid the packing here. Unaligned accesses are inefficient on
many systems.

> +typedef struct circ_buf_desc cbd_t;

Also, don't pass structures by value if they don't fit into one or
two registers.

> +/* Buffer Descriptor Accessors */
> +#define CBDW_SC(_cbd, _sc) iowrite32((_sc), &(_cbd)->sc)
> +#define CBDW_LEN(_cbd, _len) iowrite32((_len), &(_cbd)->len)
> +#define CBDW_ADDR(_cbd, _addr) iowrite32((_addr), &(_cbd)->addr)
> +
> +#define CBDR_SC(_cbd) ioread32(&(_cbd)->sc)
> +#define CBDR_LEN(_cbd) ioread32(&(_cbd)->len)
> +#define CBDR_ADDR(_cbd) ioread32(&(_cbd)->addr)

We have found that accessing remote descriptors using mmio read is
rather slow, and changed the code to always do local reads and
remote writes.

On the host side, I would argue that you should use out_le32 rather
than iowrite32, because you are not operating on a PCI device but
an OF device.

> +/* IMMR Accessor Helpers */
> +#define IMMR_R32(_off) ioread32(priv->immr+(_off))
> +#define IMMR_W32(_off, _val) iowrite32((_val), priv->immr+(_off))
> +#define IMMR_R32BE(_off) ioread32be(priv->immr+(_off))
> +#define IMMR_W32BE(_off, _val) iowrite32be((_val), priv->immr+(_off))

IIRC, the IMMR is a platform resource, so the system should map those
registers already and provide accessor functions for the registers
you need. Simply allowing to pass an offset in here does not look
clean.


> +static void wqtuart_rx_char(struct uart_port *port, const char ch);
> +static void wqtuart_stop_tx(struct uart_port *port);

You should try to avoid forward declarations for static functions.
If you order the function implementation correctly, that will
also give you the expected reading order in the driver.

> +struct wqt_dev;
> +typedef void (*wqt_irqhandler_t)(struct wqt_dev *);

Much more importantly, never do forward declarations for
global functions!

These belong into a header that is included by both the
user and the definition.


> +struct wqt_dev {
> +	/*--------------------------------------------------------------------*/
> +	/* OpenFirmware Infrastructure                                        */
> +	/*--------------------------------------------------------------------*/
> +	struct of_device *op;
> +	struct device *dev;

Why the dev? You can always get that from the of_device, right?

> +	int irq;
> +	void __iomem *immr;

I don't think immr is device specific.

> +	struct mutex irq_mutex;
> +	int interrupt_count;
> +
> +	spinlock_t irq_lock;
> +	struct wqt_irqhandlers handlers;
> +
> +	/*--------------------------------------------------------------------*/
> +	/* UART Device Infrastructure                                         */
> +	/*--------------------------------------------------------------------*/
> +	struct uart_port port;
> +	bool uart_rx_enabled;
> +	bool uart_open;

hmm, if you need a uart, that sounds like it should be a separate driver
altogether. What is the relation between the network interface and the UART?

> +	struct workqueue_struct *wq;

A per-device pointer to a workqueue_struct is unusual. What are you doing
this for? How about using the system work queue?

> +	/*--------------------------------------------------------------------*/
> +	/* Ethernet Device Infrastructure                                     */
> +	/*--------------------------------------------------------------------*/
> +	struct net_device *ndev;

Why make this a separate structure? If you have one of these per net_device,
you should embed the net_device into your own structure.

> +	struct tasklet_struct tx_complete_tasklet;

Using a tasklet for tx processing sounds fishy because most of the
network code already runs at softirq time. You do not gain anything
by another softirq context.

> +/*----------------------------------------------------------------------------*/
> +/* Status Register Helper Operations                                          */
> +/*----------------------------------------------------------------------------*/
> +
> +static DEFINE_SPINLOCK(status_lock);
> +
> +static void wqtstatus_setbit(struct wqt_dev *priv, u32 bit)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&status_lock, flags);
> +	IMMR_W32(OMR1_OFFSET, IMMR_R32(OMR1_OFFSET) | bit);
> +	spin_unlock_irqrestore(&status_lock, flags);
> +}
> +
> +static void wqtstatus_clrbit(struct wqt_dev *priv, u32 bit)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&status_lock, flags);
> +	IMMR_W32(OMR1_OFFSET, IMMR_R32(OMR1_OFFSET) & ~bit);
> +	spin_unlock_irqrestore(&status_lock, flags);
> +}

This looks misplaced, as mentioned before.

> +static int wqtstatus_remote_testbit(struct wqt_dev *priv, u32 bit)
> +{
> +	return IMMR_R32(IMR1_OFFSET) & bit;
> +}

Are you sure that do not need a spinlock here? The lock also
prevents reordering the device access arount the code using
the mmio data.

> +/*----------------------------------------------------------------------------*/
> +/* Message Sending and Processing Operations                                  */
> +/*----------------------------------------------------------------------------*/
> +
> +static irqreturn_t wqt_interrupt(int irq, void *dev_id)
> +{
> +	struct wqt_dev *priv = dev_id;
> +	u32 imisr, idr;
> +	unsigned long flags;
> +
> +	imisr = IMMR_R32(IMISR_OFFSET);
> +	idr = IMMR_R32(IDR_OFFSET);
> +
> +	if (!(imisr & 0x8))
> +		return IRQ_NONE;
> +
> +	/* Clear all of the interrupt sources, we'll handle them next */
> +	IMMR_W32(IDR_OFFSET, idr);
> +
> +	/* Lock over all of the handlers, so they cannot get called when
> +	 * the code doesn't expect them to be called */
> +	spin_lock_irqsave(&priv->irq_lock, flags);

If this is in an interrupt handler, why disable the interrupts again?
The same comment applies to many of the other places where you
use spin_lock_irqsave rather than spin_lock or spin_lock_irq.

> +/* NOTE: All handlers are called with priv->irq_lock held */
> +
> +static void empty_handler(struct wqt_dev *priv)
> +{
> +	/* Intentionally left empty */
> +}
> +
> +static void net_start_req_handler(struct wqt_dev *priv)
> +{
> +	schedule_work(&priv->net_start_work);
> +}
> +
> +static void net_start_ack_handler(struct wqt_dev *priv)
> +{
> +	complete(&priv->net_start_completion);
> +}
> +
> +static void net_stop_req_handler(struct wqt_dev *priv)
> +{
> +	schedule_work(&priv->net_stop_work);
> +}
> +
> +static void net_stop_ack_handler(struct wqt_dev *priv)
> +{
> +	complete(&priv->net_stop_completion);
> +}
> +
> +static void net_tx_complete_handler(struct wqt_dev *priv)
> +{
> +	tasklet_schedule(&priv->tx_complete_tasklet);
> +}
> +
> +static void net_rx_packet_handler(struct wqt_dev *priv)
> +{
> +	wqtstatus_setbit(priv, PCINET_NET_RXINT_OFF);
> +	netif_rx_schedule(priv->ndev, &priv->napi);
> +}
> +
> +static void uart_rx_ready_handler(struct wqt_dev *priv)
> +{
> +	wqtuart_rx_char(&priv->port, IMMR_R32(IMR0_OFFSET) & 0xff);
> +	IMMR_W32(ODR_OFFSET, UART_TX_EMPTY_DBELL);
> +}
> +
> +static void uart_tx_empty_handler(struct wqt_dev *priv)
> +{
> +	priv->uart_tx_ready = true;
> +	wake_up(&priv->uart_tx_wait);
> +}

Since these all seem to be trivial functions, why go through the
indirect function pointers at all? It seems that wqt_interrupt
could do this just as well.

> +
> +static int wqt_request_irq(struct wqt_dev *priv)
> +{
> +	int ret = 0;

no point initializing ret.

> +	mutex_lock(&priv->irq_mutex);

What data does the irq_mutex protect?

	Arnd <><
Ira Snyder Nov. 4, 2008, 5:34 p.m. UTC | #7
On Tue, Nov 04, 2008 at 01:09:25PM +0100, Arnd Bergmann wrote:
> On Wednesday 29 October 2008, Ira Snyder wrote:
> > This adds support to Linux for a virtual ethernet interface which uses the
> > PCI bus as its transport mechanism. It creates a simple, familiar, and fast
> > method of communication for two devices connected by a PCI interface.
> 
> Very interesting. I have seen multiple such drivers, and still plan to
> have one as well for the cell/axon platform.
> 
> > This is the third posting of this driver. I got some feedback, and have
> > corrected the problems. Stephen, thanks for the review! I also got some
> > off-list feedback from a potential user, so I believe this is worth getting
> > into mainline.
> 
> Sorry for not having replied earlier, I did not see first postings.
> While I did not see any major problems with your code, I think we should
> actually do it in a different way, which I have talked about at the
> Linux plumbers conference, and will again at the UKUUG Linux conference
> this weekend. You can find my slides from the previous talk at
> http://userweb.kernel.org/%7Earnd/papers/plumbers08/plumbers-slides.pdf
>  

I'll read through these slides as soon as I'm finished replying to this
email :)

> Since you have code and I only have plans, I suppose I can't make you
> wait for me to get a driver for your platform, but I hope we can join
> forces in the future and share code in the future.
> 

I hope so too. I wrote this because I needed something working. I'm
completely happy to help write code for another implementation.

> My fundamental criticism of your code is that it is not flexible enough.
> We have a number of platforms that act as a PCI, PCIe or similar kind
> of device and want to transfer network data. At the same time, some of
> these want to use the same infrastructure for non-network data like
> MPI (ibverbs), block device or file system (9p). Even in your limited
> setup, it seems that you should be able to share more code between the
> two implementations you have posted here. When looking at the code, I
> found a lot of duplication between the drivers that you should be able
> to avoid.
> 

I completely agree. I made the host and client drivers as similar as
possible on purpose, to increase my own code sharing. In fact, if you
diff them, you'll see that they are almost identical.

I decided that it would be better to post them up for review than try to
fix every little problem.

> My suggestion is to base this on the virtio infrastructure, and some
> of my colleagues at IBM already implemented a driver that uses
> virtio-net on a PCIe connection.
> 

To be honest, I couldn't get that figured out. I looked at that code for
a couple of weeks and couldn't make a lot of sense out of it. I couldn't
find much really good documentation, either. I'd love to re-use more
code than I have.

> 	Arnd <><
> 
> Now a few comments on your code:
> 
> 
> > diff --git a/arch/powerpc/boot/dts/mpc834x_mds.dts b/arch/powerpc/boot/dts/mpc834x_mds.dts
> > index c986c54..3bc8975 100644
> > --- a/arch/powerpc/boot/dts/mpc834x_mds.dts
> > +++ b/arch/powerpc/boot/dts/mpc834x_mds.dts
> > @@ -104,6 +104,13 @@
> >  			mode = "cpu";
> >  		};
> >  
> > +		message-unit@8030 {
> > +			compatible = "fsl,mpc8349-mu";
> > +			reg = <0x8030 0xd0>;
> > +			interrupts = <69 0x8>;
> > +			interrupt-parent = <&ipic>;
> > +		};
> 
> What is a message unit? Is this the mailbox you use in the driver?
> We are facing a similar problem on Axon regarding probing of
> the virtual device because you can't easily tell from the device
> tree whether the machine is connected to something that is trying
> to communicate.
> 
> The message-unit does not seem to be network-specific, so it's not
> particularly nice to have the network unit grab that of_device.
> 

Yes, the message unit is how the Freescale reference manual describes
this hardware. It is the mailbox hardware to generate interrupts, plus a
few registers to transfer data.

The registers aren't network specific, but they are needed for this
driver. That is how the driver generates PCI interrupts to notify the
other side of the connection it has new data, etc.

> > +config PCINET_FSL
> > +	tristate "PCINet Virtual Ethernet over PCI support (Freescale)"
> > +	depends on MPC834x_MDS && !PCI
> > +	select DMA_ENGINE
> > +	select FSL_DMA
> > +	help
> > +	  When running as a PCI Agent, this driver will create a virtual
> > +	  ethernet link running over the PCI bus, allowing simplified
> > +	  communication with the host system. The host system will need
> > +	  to use the corresponding driver.
> > +
> > +	  If in doubt, say N.
> 
> Why 'depends on !PCI'? This means that you cannot build a kernel that
> is able to run both as host and endpoint for PCInet, right?
> 

Yes, that is correct. I did this because the Linux PCI code does some
relatively nasty things in agent mode. One thing that consistently
crashed my box was running through the quirks list and trying to
re-initialize an e100 that was in my box.

Remember, this is a PCI agent. It shouldn't be re-initializing the other
hardware on the PCI bus.

> > +config PCINET_HOST
> > +	tristate "PCINet Virtual Ethernet over PCI support (Host)"
> > +	depends on PCI
> > +	help
> > +	  This driver will let you communicate with a PCINet client device
> > +	  using a virtual ethernet link running over the PCI bus. This
> > +	  allows simplified communication with the client system.
> > +
> > +	  This is inteded for use in a system that has a crate full of
> > +	  computers running Linux, all connected by a PCI backplane.
> > +
> > +	  If in doubt, say N.
> 
> Is this meant to be portable across different hardware implementations?
> If the driver can only work with PCINET_FSL on the other side, you
> should probably mention that in the description.
> 

I had hoped other people would come along and help port it to other
hardware, like you yourself are suggesting. :) I'll add to the
description.

> > +config PCINET_DISABLE_CHECKSUM
> > +	bool "Disable packet checksumming"
> > +	depends on PCINET_FSL || PCINET_HOST
> > +	default n
> > +	help
> > +	  Disable packet checksumming on packets received by the PCINet
> > +	  driver. This gives a possible speed boost.
> 
> Why make this one optional? Is there ever a reason to enable checksumming?
> If there is, how about making it tunable with ethtool?
> 

I left it optional so I could turn it on and off easily. I have no
strong feelings on keeping it optional.

Does the PCI bus reliably transfer data? I'm not sure. I left it there
so that we could at least turn on checksumming if there was a problem.

> > +struct circ_buf_desc {
> > +	__le32 sc;
> > +	__le32 len;
> > +	__le32 addr;
> > +} __attribute__((__packed__));
> 
> It would be useful to always force aligning the desciptors to the whole
> 32 bit and avoid the packing here. Unaligned accesses are inefficient on
> many systems.
> 

I don't really know how to do that. I got a warning here from sparse
telling me something about expensive pointer subtraction. Adding a dummy
32bit padding variable got rid of the warning, but I didn't change the
driver.

> > +typedef struct circ_buf_desc cbd_t;
> 
> Also, don't pass structures by value if they don't fit into one or
> two registers.
> 

These are only used for pointers to the buffer descriptors (in RAM on
the Freescale) that hold packet information. I never copy them directly.

> > +/* Buffer Descriptor Accessors */
> > +#define CBDW_SC(_cbd, _sc) iowrite32((_sc), &(_cbd)->sc)
> > +#define CBDW_LEN(_cbd, _len) iowrite32((_len), &(_cbd)->len)
> > +#define CBDW_ADDR(_cbd, _addr) iowrite32((_addr), &(_cbd)->addr)
> > +
> > +#define CBDR_SC(_cbd) ioread32(&(_cbd)->sc)
> > +#define CBDR_LEN(_cbd) ioread32(&(_cbd)->len)
> > +#define CBDR_ADDR(_cbd) ioread32(&(_cbd)->addr)
> 
> We have found that accessing remote descriptors using mmio read is
> rather slow, and changed the code to always do local reads and
> remote writes.
> 

Interesting. I don't know how you would get network speed doing this.
X86 systems don't have a DMA conttroller. The entire purpose of making
the Freescale do all the copying was to use its DMA controller.

Using the DMA controller to transfer all of the data took my transfer
speed from ~3MB/sec to ~45MB/sec. While that is a good increase, it
could be better. I should be able to hit close to 133MB/sec (the limit
of PCI)

> On the host side, I would argue that you should use out_le32 rather
> than iowrite32, because you are not operating on a PCI device but
> an OF device.
> 

Yeah, you're probably correct. I can't remember if I tried this and had
problems or not.

> > +/* IMMR Accessor Helpers */
> > +#define IMMR_R32(_off) ioread32(priv->immr+(_off))
> > +#define IMMR_W32(_off, _val) iowrite32((_val), priv->immr+(_off))
> > +#define IMMR_R32BE(_off) ioread32be(priv->immr+(_off))
> > +#define IMMR_W32BE(_off, _val) iowrite32be((_val), priv->immr+(_off))
> 
> IIRC, the IMMR is a platform resource, so the system should map those
> registers already and provide accessor functions for the registers
> you need. Simply allowing to pass an offset in here does not look
> clean.
> 
> 

Correct. This was done to make both sides as identical as possible. The
Freescale exports the entire 1MB block of IMMR registers at PCI BAR0. So
I have to use the offsets on the host side.

From the client side, I could just map what I need, but that would make
the two drivers diverge. I was trying to keep them the same.

> > +static void wqtuart_rx_char(struct uart_port *port, const char ch);
> > +static void wqtuart_stop_tx(struct uart_port *port);
> 
> You should try to avoid forward declarations for static functions.
> If you order the function implementation correctly, that will
> also give you the expected reading order in the driver.
> 

Yep, I tried to do this. I couldn't figure out a sane ordering that
would work. I tried to keep the network and uart as seperate as possible
in the code.

> > +struct wqt_dev;
> > +typedef void (*wqt_irqhandler_t)(struct wqt_dev *);
> 
> Much more importantly, never do forward declarations for
> global functions!
> 
> These belong into a header that is included by both the
> user and the definition.
> 
> 

Ok.

> > +struct wqt_dev {
> > +	/*--------------------------------------------------------------------*/
> > +	/* OpenFirmware Infrastructure                                        */
> > +	/*--------------------------------------------------------------------*/
> > +	struct of_device *op;
> > +	struct device *dev;
> 
> Why the dev? You can always get that from the of_device, right?
> 

Yes. I stored it there to make it identical to the host driver. By doing
this, both drivers have code that says "dev_debug(priv->dev, ...)"
rather than:

Host:
dev_debug(&priv->pdev->dev, ...)

Freescale:
dev_debug(&priv->op->dev, ...)

> > +	int irq;
> > +	void __iomem *immr;
> 
> I don't think immr is device specific.
> 

Correct, it is not. Both the PCI subsystem and the OF subsystem will
call probe() when a new device is detected. I stored it here so I could
ioremap() them at probe() time, and to keep both drivers' code the same.

> > +	struct mutex irq_mutex;
> > +	int interrupt_count;
> > +
> > +	spinlock_t irq_lock;
> > +	struct wqt_irqhandlers handlers;
> > +
> > +	/*--------------------------------------------------------------------*/
> > +	/* UART Device Infrastructure                                         */
> > +	/*--------------------------------------------------------------------*/
> > +	struct uart_port port;
> > +	bool uart_rx_enabled;
> > +	bool uart_open;
> 
> hmm, if you need a uart, that sounds like it should be a separate driver
> altogether. What is the relation between the network interface and the UART?
> 

Yes, I agree. How do you make two Linux drivers that can be loaded for
the same hardware at the same time? :) AFAIK, you cannot.

I NEED two functions accessible at the same time, network (to transfer
data) and uart (to control my bootloader).

I use the uart to interact with the bootloader (U-Boot) and tell it
where to tftp a kernel. I use the network to transfer the kernel.

So you see, I really do need them both at the same time. If you know a
better way to do this, please let me know!

It was possible to write seperate U-Boot drivers, but only by being
careful to not conflict in my usage of the hardware.

> > +	struct workqueue_struct *wq;
> 
> A per-device pointer to a workqueue_struct is unusual. What are you doing
> this for? How about using the system work queue?
> 

Look through the code carefully, especially at uart_tx_work_fn(), which
calls do_send_message().

do_send_message() can sleep for a long time. If I use the shared
workqueue, I block ALL OTHER USERS of the queue for the time I am
asleep. They cannot run.

The LDD3 book says not to use the shared workqueue if you are going to
sleep for long periods. I followed their advice.

> > +	/*--------------------------------------------------------------------*/
> > +	/* Ethernet Device Infrastructure                                     */
> > +	/*--------------------------------------------------------------------*/
> > +	struct net_device *ndev;
> 
> Why make this a separate structure? If you have one of these per net_device,
> you should embed the net_device into your own structure.
> 

This structure is embedded in struct net_device! Look at how
alloc_etherdev() works. You pass it the size of your private data
structure and it allocates the space for you.

> > +	struct tasklet_struct tx_complete_tasklet;
> 
> Using a tasklet for tx processing sounds fishy because most of the
> network code already runs at softirq time. You do not gain anything
> by another softirq context.
> 

I didn't want to run the TX cleanup routine at hard irq time, because it
can potentially take some time to run. I would rather run it with hard
interrupts enabled.

This DOES NOT do TX processing, it only frees skbs that have been
transferred. I used the network stack to do as much as possible, of
course.

> > +/*----------------------------------------------------------------------------*/
> > +/* Status Register Helper Operations                                          */
> > +/*----------------------------------------------------------------------------*/
> > +
> > +static DEFINE_SPINLOCK(status_lock);
> > +
> > +static void wqtstatus_setbit(struct wqt_dev *priv, u32 bit)
> > +{
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&status_lock, flags);
> > +	IMMR_W32(OMR1_OFFSET, IMMR_R32(OMR1_OFFSET) | bit);
> > +	spin_unlock_irqrestore(&status_lock, flags);
> > +}
> > +
> > +static void wqtstatus_clrbit(struct wqt_dev *priv, u32 bit)
> > +{
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&status_lock, flags);
> > +	IMMR_W32(OMR1_OFFSET, IMMR_R32(OMR1_OFFSET) & ~bit);
> > +	spin_unlock_irqrestore(&status_lock, flags);
> > +}
> 
> This looks misplaced, as mentioned before.
> 

I used these so I could have some remote status. Remember, this has to
work for Linux -> U-Boot, not just Linux <-> Linux.

While the board is booting Linux, the RAM I used to hold buffer
descriptors disappears! I need some shared status to make sure no stray
accesses happen to random areas of memory.

The case this potentially happens is:
1) Start Freescale board, do NOT install the driver
2) Insert the driver into the host system

Now, if the host tries writing to the buffer descriptors, we've
corrupted memory on the Freescale board.

> > +static int wqtstatus_remote_testbit(struct wqt_dev *priv, u32 bit)
> > +{
> > +	return IMMR_R32(IMR1_OFFSET) & bit;
> > +}
> 
> Are you sure that do not need a spinlock here? The lock also
> prevents reordering the device access arount the code using
> the mmio data.
> 

I don't think so. Note that this is a status bit on the remote side of
the connection. My access ordering means nothing when the remote side is
setting and clearing the bit.

> > +/*----------------------------------------------------------------------------*/
> > +/* Message Sending and Processing Operations                                  */
> > +/*----------------------------------------------------------------------------*/
> > +
> > +static irqreturn_t wqt_interrupt(int irq, void *dev_id)
> > +{
> > +	struct wqt_dev *priv = dev_id;
> > +	u32 imisr, idr;
> > +	unsigned long flags;
> > +
> > +	imisr = IMMR_R32(IMISR_OFFSET);
> > +	idr = IMMR_R32(IDR_OFFSET);
> > +
> > +	if (!(imisr & 0x8))
> > +		return IRQ_NONE;
> > +
> > +	/* Clear all of the interrupt sources, we'll handle them next */
> > +	IMMR_W32(IDR_OFFSET, idr);
> > +
> > +	/* Lock over all of the handlers, so they cannot get called when
> > +	 * the code doesn't expect them to be called */
> > +	spin_lock_irqsave(&priv->irq_lock, flags);
> 
> If this is in an interrupt handler, why disable the interrupts again?
> The same comment applies to many of the other places where you
> use spin_lock_irqsave rather than spin_lock or spin_lock_irq.
> 

I tried to make the locking do only what was needed. I just couldn't get
it correct unless I used spin_lock_irqsave(). I was able to get the
system to deadlock otherwise. This is why I posted the driver for
review, I could use some help here.

It isn't critical anyway. You can always use spin_lock_irqsave(), it is
just a little slower, but it will always work :)

> > +/* NOTE: All handlers are called with priv->irq_lock held */
> > +
> > +static void empty_handler(struct wqt_dev *priv)
> > +{
> > +	/* Intentionally left empty */
> > +}
> > +
> > +static void net_start_req_handler(struct wqt_dev *priv)
> > +{
> > +	schedule_work(&priv->net_start_work);
> > +}
> > +
> > +static void net_start_ack_handler(struct wqt_dev *priv)
> > +{
> > +	complete(&priv->net_start_completion);
> > +}
> > +
> > +static void net_stop_req_handler(struct wqt_dev *priv)
> > +{
> > +	schedule_work(&priv->net_stop_work);
> > +}
> > +
> > +static void net_stop_ack_handler(struct wqt_dev *priv)
> > +{
> > +	complete(&priv->net_stop_completion);
> > +}
> > +
> > +static void net_tx_complete_handler(struct wqt_dev *priv)
> > +{
> > +	tasklet_schedule(&priv->tx_complete_tasklet);
> > +}
> > +
> > +static void net_rx_packet_handler(struct wqt_dev *priv)
> > +{
> > +	wqtstatus_setbit(priv, PCINET_NET_RXINT_OFF);
> > +	netif_rx_schedule(priv->ndev, &priv->napi);
> > +}
> > +
> > +static void uart_rx_ready_handler(struct wqt_dev *priv)
> > +{
> > +	wqtuart_rx_char(&priv->port, IMMR_R32(IMR0_OFFSET) & 0xff);
> > +	IMMR_W32(ODR_OFFSET, UART_TX_EMPTY_DBELL);
> > +}
> > +
> > +static void uart_tx_empty_handler(struct wqt_dev *priv)
> > +{
> > +	priv->uart_tx_ready = true;
> > +	wake_up(&priv->uart_tx_wait);
> > +}
> 
> Since these all seem to be trivial functions, why go through the
> indirect function pointers at all? It seems that wqt_interrupt
> could do this just as well.
> 

I did this so they can be disabled independently. It is VERY tricky to
get the board started without the chance of corrupting memory. Write the
code, you'll see :)

You have to deal with two systems that can read/write each other's
memory, but DO NOT have any synchronization primitives.

> > +
> > +static int wqt_request_irq(struct wqt_dev *priv)
> > +{
> > +	int ret = 0;
> 
> no point initializing ret.
> 
> > +	mutex_lock(&priv->irq_mutex);
> 
> What data does the irq_mutex protect?
> 

priv->interrupt_count and the indirect handlers.

Note that I have two fundamentally seperate devices here. The network
and the uart.

What if noone has brought up the network interface? You'd still expect
the uart to work, right? In this case, I need to call request_irq() in
TWO places, the network's open() and the uart's open().

There is only a single interrupt for this hardware. When only the uart
is open, I only want the uart section of the irq handler enabled. This
is where the indirect handlers come in. On uart open(), I call
wqt_request_irq(), which will ONLY call request_irq() if it HAS NOT been
called yet.

Here are a few examples:
Ex1: uart only
1) uart open() is called
2) wqt_request_irq() is called
3) the uart irq handling functions are enabled
4) request_irq() is called, hooking up the irq
5) the uart is started

The network by itself works exactly the same.

Ex2: uart first, then network while uart is open
1) uart open() is called
2) wqt_request_irq() is called
3) the uart irq handling functions are enabled
4) request_irq() is called, hooking up the irq handler
5) the uart is started
6) time passes
7) network open() is called
8) wqt_request_irq() is called
9) request_irq() is NOT CALLED THIS TIME, because it is already enabled
10) the network irq handling functions are enabled
11) the network is started



Thanks so much for the review! I hope we can work together to get
something that can be merged into mainline Linux. I'm willing to write
code, I just need some direction from more experienced kernel
developers.

Ira
Arnd Bergmann Nov. 4, 2008, 8:23 p.m. UTC | #8
On Tuesday 04 November 2008, Ira Snyder wrote:
> On Tue, Nov 04, 2008 at 01:09:25PM +0100, Arnd Bergmann wrote:
> >
> > Why 'depends on !PCI'? This means that you cannot build a kernel that
> > is able to run both as host and endpoint for PCInet, right?
> > 
> 
> Yes, that is correct. I did this because the Linux PCI code does some
> relatively nasty things in agent mode. One thing that consistently
> crashed my box was running through the quirks list and trying to
> re-initialize an e100 that was in my box.
> 
> Remember, this is a PCI agent. It shouldn't be re-initializing the other
> hardware on the PCI bus.

Yes, that makes sense. However, you should still be able to have the
PCI code built into the kernel, as long as you prevent it from scanning
the bus on the machine that is in agent/endpoint mode.

This should be made clear in the device tree. On the QS22 machine, we
remove the "pci" device from the device tree, and add a "pcie-ep"
device.

> I left it optional so I could turn it on and off easily. I have no
> strong feelings on keeping it optional.
> 
> Does the PCI bus reliably transfer data? I'm not sure. I left it there
> so that we could at least turn on checksumming if there was a problem.

Yes, PCI guarantees reliable transfers.
 
> > > +struct circ_buf_desc {
> > > +	__le32 sc;
> > > +	__le32 len;
> > > +	__le32 addr;
> > > +} __attribute__((__packed__));
> > 
> > It would be useful to always force aligning the desciptors to the whole
> > 32 bit and avoid the packing here. Unaligned accesses are inefficient on
> > many systems.
> > 
> 
> I don't really know how to do that. I got a warning here from sparse
> telling me something about expensive pointer subtraction. Adding a dummy
> 32bit padding variable got rid of the warning, but I didn't change the
> driver.

Ok, I see. However, adding the packed attribute makes it more expensive
to use.

> > > +typedef struct circ_buf_desc cbd_t;
> > 
> > Also, don't pass structures by value if they don't fit into one or
> > two registers.
> > 
> 
> These are only used for pointers to the buffer descriptors (in RAM on
> the Freescale) that hold packet information. I never copy them directly.

Ok, then you should not have a typedef.

> > > +/* Buffer Descriptor Accessors */
> > > +#define CBDW_SC(_cbd, _sc) iowrite32((_sc), &(_cbd)->sc)
> > > +#define CBDW_LEN(_cbd, _len) iowrite32((_len), &(_cbd)->len)
> > > +#define CBDW_ADDR(_cbd, _addr) iowrite32((_addr), &(_cbd)->addr)
> > > +
> > > +#define CBDR_SC(_cbd) ioread32(&(_cbd)->sc)
> > > +#define CBDR_LEN(_cbd) ioread32(&(_cbd)->len)
> > > +#define CBDR_ADDR(_cbd) ioread32(&(_cbd)->addr)
> > 
> > We have found that accessing remote descriptors using mmio read is
> > rather slow, and changed the code to always do local reads and
> > remote writes.
> > 
> Interesting. I don't know how you would get network speed doing this.
> X86 systems don't have a DMA conttroller. The entire purpose of making
> the Freescale do all the copying was to use its DMA controller.
> 
> Using the DMA controller to transfer all of the data took my transfer
> speed from ~3MB/sec to ~45MB/sec. While that is a good increase, it
> could be better. I should be able to hit close to 133MB/sec (the limit
> of PCI)

Then I think I misunderstood something about this driver. Are these
descriptors accessed by the DMA engine, or by software? If it's the
DMA engine accessing them, can you put the descriptors on both sides
of the bus rather than just on one side?

Which side allocates them anyway? Since you use ioread32/iowrite32
on the ppc side, it looks like they are on the PCI host, which does
not seem to make much sense, because the ppc memory is much closer
to the DMA engine?

Obviously, you want the DMA engine to do the data transfers, but here, you
use ioread32 for mmio transfers to the descriptors, which is slow.

> Correct. This was done to make both sides as identical as possible. The
> Freescale exports the entire 1MB block of IMMR registers at PCI BAR0. So
> I have to use the offsets on the host side.
> 
> From the client side, I could just map what I need, but that would make
> the two drivers diverge. I was trying to keep them the same.

Ah, I see. We had the same problem on Axon, and I'm still looking for a
good solution. The best option is probably to abstract the immr access
in some way and provide a driver that implements them on top of PCI.
> 
> > > +static void wqtuart_rx_char(struct uart_port *port, const char ch);
> > > +static void wqtuart_stop_tx(struct uart_port *port);
> > 
> > You should try to avoid forward declarations for static functions.
> > If you order the function implementation correctly, that will
> > also give you the expected reading order in the driver.
> > 
> 
> Yep, I tried to do this. I couldn't figure out a sane ordering that
> would work. I tried to keep the network and uart as seperate as possible
> in the code.

I'd suggest splitting the uart code into a separate driver then.

> > > +struct wqt_dev {
> > > +	/*--------------------------------------------------------------------*/
> > > +	/* OpenFirmware Infrastructure                                        */
> > > +	/*--------------------------------------------------------------------*/
> > > +	struct of_device *op;
> > > +	struct device *dev;
> > 
> > Why the dev? You can always get that from the of_device, right?
> > 
> 
> Yes. I stored it there to make it identical to the host driver. By doing
> this, both drivers have code that says "dev_debug(priv->dev, ...)"
> rather than:
> 
> Host:
> dev_debug(&priv->pdev->dev, ...)
> 
> Freescale:
> dev_debug(&priv->op->dev, ...)

Ok. You can just store the dev pointer then, and leave out the op pointer.
You can always do a container_of() to get back to it.
 
> Yes, I agree. How do you make two Linux drivers that can be loaded for
> the same hardware at the same time? :) AFAIK, you cannot.
> 
> I NEED two functions accessible at the same time, network (to transfer
> data) and uart (to control my bootloader).
> 
> I use the uart to interact with the bootloader (U-Boot) and tell it
> where to tftp a kernel. I use the network to transfer the kernel.
> 
> So you see, I really do need them both at the same time. If you know a
> better way to do this, please let me know!
> 
> It was possible to write seperate U-Boot drivers, but only by being
> careful to not conflict in my usage of the hardware.

Ok, I see. I fear any nice solution would make the u-boot drivers much
more complex.
 
> > > +	struct workqueue_struct *wq;
> > 
> > A per-device pointer to a workqueue_struct is unusual. What are you doing
> > this for? How about using the system work queue?
> > 
> 
> Look through the code carefully, especially at uart_tx_work_fn(), which
> calls do_send_message().
> 
> do_send_message() can sleep for a long time. If I use the shared
> workqueue, I block ALL OTHER USERS of the queue for the time I am
> asleep. They cannot run.

ok.

> The LDD3 book says not to use the shared workqueue if you are going to
> sleep for long periods. I followed their advice.

ok.

> > > +	/*--------------------------------------------------------------------*/
> > > +	/* Ethernet Device Infrastructure                                     */
> > > +	/*--------------------------------------------------------------------*/
> > > +	struct net_device *ndev;
> > 
> > Why make this a separate structure? If you have one of these per net_device,
> > you should embed the net_device into your own structure.
> > 
> 
> This structure is embedded in struct net_device! Look at how
> alloc_etherdev() works. You pass it the size of your private data
> structure and it allocates the space for you.

right, I remember now. Unfortunately, alloc_etherdev is a little bit
different from many other kernel interfaces.

> > > +	struct tasklet_struct tx_complete_tasklet;
> > 
> > Using a tasklet for tx processing sounds fishy because most of the
> > network code already runs at softirq time. You do not gain anything
> > by another softirq context.
> > 
> 
> I didn't want to run the TX cleanup routine at hard irq time, because it
> can potentially take some time to run. I would rather run it with hard
> interrupts enabled.

sure.

> This DOES NOT do TX processing, it only frees skbs that have been
> transferred. I used the network stack to do as much as possible, of
> course.

Most drivers now do that from the *rx* poll function, and call
netif_rx_schedule when they get a tx interrupt.

> > If this is in an interrupt handler, why disable the interrupts again?
> > The same comment applies to many of the other places where you
> > use spin_lock_irqsave rather than spin_lock or spin_lock_irq.
> > 
> 
> I tried to make the locking do only what was needed. I just couldn't get
> it correct unless I used spin_lock_irqsave(). I was able to get the
> system to deadlock otherwise. This is why I posted the driver for
> review, I could use some help here.
> 
> It isn't critical anyway. You can always use spin_lock_irqsave(), it is
> just a little slower, but it will always work :)

I like the documenting character of the spinlock functions. E.g. if you
use spin_lock_irq() in a function, it is obvious that interrupts are enabled,
and if you use spin_lock() on a lock that requires disabling interrupts,
you know that interrupts are already off.
 
> Thanks so much for the review! I hope we can work together to get
> something that can be merged into mainline Linux. I'm willing to write
> code, I just need some direction from more experienced kernel
> developers.

Great, I can certainly help with that. Please CC me on anything related
to this driver.

	Arnd <><
Ira Snyder Nov. 4, 2008, 9:25 p.m. UTC | #9
On Tue, Nov 04, 2008 at 09:23:03PM +0100, Arnd Bergmann wrote:
> On Tuesday 04 November 2008, Ira Snyder wrote:
> > On Tue, Nov 04, 2008 at 01:09:25PM +0100, Arnd Bergmann wrote:
> > >
> > > Why 'depends on !PCI'? This means that you cannot build a kernel that
> > > is able to run both as host and endpoint for PCInet, right?
> > > 
> > 
> > Yes, that is correct. I did this because the Linux PCI code does some
> > relatively nasty things in agent mode. One thing that consistently
> > crashed my box was running through the quirks list and trying to
> > re-initialize an e100 that was in my box.
> > 
> > Remember, this is a PCI agent. It shouldn't be re-initializing the other
> > hardware on the PCI bus.
> 
> Yes, that makes sense. However, you should still be able to have the
> PCI code built into the kernel, as long as you prevent it from scanning
> the bus on the machine that is in agent/endpoint mode.
> 
> This should be made clear in the device tree. On the QS22 machine, we
> remove the "pci" device from the device tree, and add a "pcie-ep"
> device.
> 

Ok, that makes perfect sense. I'll test it at some point and make sure
that the kernel doesn't go through the quirks list, but it sounds
reasonable to assume it doesn't.

> > I left it optional so I could turn it on and off easily. I have no
> > strong feelings on keeping it optional.
> > 
> > Does the PCI bus reliably transfer data? I'm not sure. I left it there
> > so that we could at least turn on checksumming if there was a problem.
> 
> Yes, PCI guarantees reliable transfers.
>  

Great, I didn't know that. I'll turn it off unconditionally. Disabling
the checksumming gave me a few extra MB/sec.

> > > > +struct circ_buf_desc {
> > > > +	__le32 sc;
> > > > +	__le32 len;
> > > > +	__le32 addr;
> > > > +} __attribute__((__packed__));
> > > 
> > > It would be useful to always force aligning the desciptors to the whole
> > > 32 bit and avoid the packing here. Unaligned accesses are inefficient on
> > > many systems.
> > > 
> > 
> > I don't really know how to do that. I got a warning here from sparse
> > telling me something about expensive pointer subtraction. Adding a dummy
> > 32bit padding variable got rid of the warning, but I didn't change the
> > driver.
> 
> Ok, I see. However, adding the packed attribute makes it more expensive
> to use.
> 

Ok. Is there any way to make sure that the structure compiles to the
same representation on the host and agent system without using packed?

> > > > +typedef struct circ_buf_desc cbd_t;
> > > 
> > > Also, don't pass structures by value if they don't fit into one or
> > > two registers.
> > > 
> > 
> > These are only used for pointers to the buffer descriptors (in RAM on
> > the Freescale) that hold packet information. I never copy them directly.
> 
> Ok, then you should not have a typedef.
> 

Ok, it is gone in my latest version.

> > > > +/* Buffer Descriptor Accessors */
> > > > +#define CBDW_SC(_cbd, _sc) iowrite32((_sc), &(_cbd)->sc)
> > > > +#define CBDW_LEN(_cbd, _len) iowrite32((_len), &(_cbd)->len)
> > > > +#define CBDW_ADDR(_cbd, _addr) iowrite32((_addr), &(_cbd)->addr)
> > > > +
> > > > +#define CBDR_SC(_cbd) ioread32(&(_cbd)->sc)
> > > > +#define CBDR_LEN(_cbd) ioread32(&(_cbd)->len)
> > > > +#define CBDR_ADDR(_cbd) ioread32(&(_cbd)->addr)
> > > 
> > > We have found that accessing remote descriptors using mmio read is
> > > rather slow, and changed the code to always do local reads and
> > > remote writes.
> > > 
> > Interesting. I don't know how you would get network speed doing this.
> > X86 systems don't have a DMA conttroller. The entire purpose of making
> > the Freescale do all the copying was to use its DMA controller.
> > 
> > Using the DMA controller to transfer all of the data took my transfer
> > speed from ~3MB/sec to ~45MB/sec. While that is a good increase, it
> > could be better. I should be able to hit close to 133MB/sec (the limit
> > of PCI)
> 
> Then I think I misunderstood something about this driver. Are these
> descriptors accessed by the DMA engine, or by software? If it's the
> DMA engine accessing them, can you put the descriptors on both sides
> of the bus rather than just on one side?
> 

I access the descriptors in software, and program the DMA controller to
transfer the data. They are not directly used by the hardware.

I used the DMAEngine API to interact with the DMA controller. I tried
programming them manually, but the DMAEngine API was about 10 MB/sec
faster than I could achieve by hand.

See dma_async_copy_raw_to_buf() and dma_async_copy_buf_to_raw() in the
PowerPC code.

The basics of the network driver are as follows:
1) PowerPC allocates 4k of RAM for buffer descriptors, and
   exposes it over PCI in BAR 1
2) Host initializes all buffer descriptors to zero
3) Host allocates RING_SIZE 64K skb's, and puts them in the RX ring

On PowerPC hard_start_xmit():
1) Find the next free buffer in the RX ring, get the address stored
   inside it
2) DMA the packet given to us by the network stack to that address
3) Mark the buffer descriptor used
4) Interrupt the host

On Host hard_start_xmit():
1) Find the next free buffer descriptor in the TX ring
2) dma_map_single() and put the address into the buffer descriptor
3) Mark the buffer descriptor as used
4) Interrupt the PowerPC

On PowerPC rx_napi(): (scheduled by interrupt)
1) Find the next dirty buffer in the TX ring, get the address and len
2) Allocate an skb of this len
3) DMA the data into the new skb
4) Pass the new skb up into the kernel
5) Mark the buffer as freeable
6) Loop until done

On Host rx_napi():
1) Find the next dirty buffer in the RX ring, get the pointer to it in
   the list of allocated skbs
2) Allocate a new 64K skb
3) Put the new skb into the buffer descriptors, mark it as clean
4) Push the skb (from the RX ring) into the kernel
5) Loop until done

So, you'll notice that I only copy the data over the PCI bus once,
directly into the skb it is supposed to be going into. The buffer
descriptors are there so I know where to find the skb in host memory
across the PCI bus.

Hopefully that's a good description. :) It seems to me that both sides
of the connection need to read the descriptors (to get packet length,
clean up dirty packets, etc.) and write them (to set packet length, mark
packets dirty, etc.) I just can't come up with something that is
local-read / remote-write only.

> Which side allocates them anyway? Since you use ioread32/iowrite32
> on the ppc side, it looks like they are on the PCI host, which does
> not seem to make much sense, because the ppc memory is much closer
> to the DMA engine?
> 

The PowerPC allocates them. They are accessible via PCI BAR1. They live
in regular RAM on the PowerPC. I can't remember why I used
ioread32/iowrite32 anymore. I'll try again with in_le32()/out_le32() on
the PowerPC system, and see what happens.

> Obviously, you want the DMA engine to do the data transfers, but here, you
> use ioread32 for mmio transfers to the descriptors, which is slow.
> 

I didn't know it was slow :) Maybe this is why I had to make the MTU
very large to get good speed. Using a standard 1500 byte MTU I get
<10 MB/sec transfer speed. Using a 64K MTU, I get ~45MB/sec transfer
speed.

Do I need to do any sort of flushing to make sure that the read has
actually gone out of cache and into memory? When the host accesses the
buffer descriptors over PCI, it can only view memory. If a write is
still in the PowerPC cache, the host will get stale data.

> > Correct. This was done to make both sides as identical as possible. The
> > Freescale exports the entire 1MB block of IMMR registers at PCI BAR0. So
> > I have to use the offsets on the host side.
> > 
> > From the client side, I could just map what I need, but that would make
> > the two drivers diverge. I was trying to keep them the same.
> 
> Ah, I see. We had the same problem on Axon, and I'm still looking for a
> good solution. The best option is probably to abstract the immr access
> in some way and provide a driver that implements them on top of PCI.
> > 
> > > > +static void wqtuart_rx_char(struct uart_port *port, const char ch);
> > > > +static void wqtuart_stop_tx(struct uart_port *port);
> > > 
> > > You should try to avoid forward declarations for static functions.
> > > If you order the function implementation correctly, that will
> > > also give you the expected reading order in the driver.
> > > 
> > 
> > Yep, I tried to do this. I couldn't figure out a sane ordering that
> > would work. I tried to keep the network and uart as seperate as possible
> > in the code.
> 
> I'd suggest splitting the uart code into a separate driver then.
> 

How? In Linux we can only have one driver for a certain set of hardware.
I use the messaging unit to do both network (interrupts and status bits)
and uart (interrupts and message transfer).

Both the network and uart _must_ run at the same time. This way I can
type into the bootloader prompt to start a network transfer, and watch
it complete.

Remember, I can't have a real serial console plugged into this board.
I'll be using this with about 150 boards in 8 separate chassis, which
makes cabling a nightmare. I'm trying to do as much as possible with the
PCI backplane.

> > > > +struct wqt_dev {
> > > > +	/*--------------------------------------------------------------------*/
> > > > +	/* OpenFirmware Infrastructure                                        */
> > > > +	/*--------------------------------------------------------------------*/
> > > > +	struct of_device *op;
> > > > +	struct device *dev;
> > > 
> > > Why the dev? You can always get that from the of_device, right?
> > > 
> > 
> > Yes. I stored it there to make it identical to the host driver. By doing
> > this, both drivers have code that says "dev_debug(priv->dev, ...)"
> > rather than:
> > 
> > Host:
> > dev_debug(&priv->pdev->dev, ...)
> > 
> > Freescale:
> > dev_debug(&priv->op->dev, ...)
> 
> Ok. You can just store the dev pointer then, and leave out the op pointer.
> You can always do a container_of() to get back to it.
>  

True, I didn't think of that. I'll make that change.

> > Yes, I agree. How do you make two Linux drivers that can be loaded for
> > the same hardware at the same time? :) AFAIK, you cannot.
> > 
> > I NEED two functions accessible at the same time, network (to transfer
> > data) and uart (to control my bootloader).
> > 
> > I use the uart to interact with the bootloader (U-Boot) and tell it
> > where to tftp a kernel. I use the network to transfer the kernel.
> > 
> > So you see, I really do need them both at the same time. If you know a
> > better way to do this, please let me know!
> > 
> > It was possible to write seperate U-Boot drivers, but only by being
> > careful to not conflict in my usage of the hardware.
> 
> Ok, I see. I fear any nice solution would make the u-boot drivers much
> more complex.
>  

Perhaps. I'm perfectly willing to port things to U-Boot. Especially if
we can make something generic enough to be re-used by many different
boards. Recently, another person on the U-Boot list has shown a need for
this kind of solution.

> > > > +	/*--------------------------------------------------------------------*/
> > > > +	/* Ethernet Device Infrastructure                                     */
> > > > +	/*--------------------------------------------------------------------*/
> > > > +	struct net_device *ndev;
> > > 
> > > Why make this a separate structure? If you have one of these per net_device,
> > > you should embed the net_device into your own structure.
> > > 
> > 
> > This structure is embedded in struct net_device! Look at how
> > alloc_etherdev() works. You pass it the size of your private data
> > structure and it allocates the space for you.
> 
> right, I remember now. Unfortunately, alloc_etherdev is a little bit
> different from many other kernel interfaces.
> 

Yep. It sure is :)

> > > > +	struct tasklet_struct tx_complete_tasklet;
> > > 
> > > Using a tasklet for tx processing sounds fishy because most of the
> > > network code already runs at softirq time. You do not gain anything
> > > by another softirq context.
> > > 
> > 
> > I didn't want to run the TX cleanup routine at hard irq time, because it
> > can potentially take some time to run. I would rather run it with hard
> > interrupts enabled.
> 
> sure.
> 
> > This DOES NOT do TX processing, it only frees skbs that have been
> > transferred. I used the network stack to do as much as possible, of
> > course.
> 
> Most drivers now do that from the *rx* poll function, and call
> netif_rx_schedule when they get a tx interrupt.
> 

That is an interesting concept. I'll look around the drivers/net tree
and try to find one that works this way. It should be pretty easy to
implement, though. I'll try it out.

> > > If this is in an interrupt handler, why disable the interrupts again?
> > > The same comment applies to many of the other places where you
> > > use spin_lock_irqsave rather than spin_lock or spin_lock_irq.
> > > 
> > 
> > I tried to make the locking do only what was needed. I just couldn't get
> > it correct unless I used spin_lock_irqsave(). I was able to get the
> > system to deadlock otherwise. This is why I posted the driver for
> > review, I could use some help here.
> > 
> > It isn't critical anyway. You can always use spin_lock_irqsave(), it is
> > just a little slower, but it will always work :)
> 
> I like the documenting character of the spinlock functions. E.g. if you
> use spin_lock_irq() in a function, it is obvious that interrupts are enabled,
> and if you use spin_lock() on a lock that requires disabling interrupts,
> you know that interrupts are already off.
>  

True. I just couldn't seem to get it right. I'll try again. Perhaps it
was another bug in the driver that I hadn't found at the time.

> > Thanks so much for the review! I hope we can work together to get
> > something that can be merged into mainline Linux. I'm willing to write
> > code, I just need some direction from more experienced kernel
> > developers.
> 
> Great, I can certainly help with that. Please CC me on anything related
> to this driver.

Will do. Please CC me on anything similar that you run across as well.
:)

Ira
Ira Snyder Nov. 4, 2008, 10:29 p.m. UTC | #10
On Tue, Nov 04, 2008 at 09:23:03PM +0100, Arnd Bergmann wrote:

Big snip.

> > 
> > I tried to make the locking do only what was needed. I just couldn't get
> > it correct unless I used spin_lock_irqsave(). I was able to get the
> > system to deadlock otherwise. This is why I posted the driver for
> > review, I could use some help here.
> > 
> > It isn't critical anyway. You can always use spin_lock_irqsave(), it is
> > just a little slower, but it will always work :)
> 
> I like the documenting character of the spinlock functions. E.g. if you
> use spin_lock_irq() in a function, it is obvious that interrupts are enabled,
> and if you use spin_lock() on a lock that requires disabling interrupts,
> you know that interrupts are already off.
>  

Ok, I've started addressing your comments. I'm having trouble with the
locking again. Let me see if my understanding is correct:

spin_lock_irqsave() disables interrupts and saves the interrupt state
spin_unlock_irqrestore() MAY re-enable interrupts based on the saved
flags

spin_lock_irq() disables interrupts, and always turns them back on when
spin_unlock_irq() is called

spin_lock_bh() disables softirqs, spin_unlock_bh() re-enables them

spin_lock() and spin_unlock() are just regular spinlocks


So, since interrupts are disabled while my interrupt handler is running,
I think I should be able to use spin_lock() and spin_unlock(), correct?

But sparse gives me the following warning:
wqt.c:185:9: warning: context imbalance in 'wqt_interrupt': wrong count at exit
wqt.c:185:9:    context 'lock': wanted 0, got 1

If I'm using spin_lock_irqsave() and spin_lock_irqrestore() I do not get
the same warnings. Therefore I must have some misunderstanding :)

Thanks,
Ira
Arnd Bergmann Nov. 5, 2008, 1:19 p.m. UTC | #11
On Tuesday 04 November 2008, Ira Snyder wrote:

> So, since interrupts are disabled while my interrupt handler is running,
> I think I should be able to use spin_lock() and spin_unlock(), correct?

yes.

> But sparse gives me the following warning:
> wqt.c:185:9: warning: context imbalance in 'wqt_interrupt': wrong count at exit
> wqt.c:185:9:    context 'lock': wanted 0, got 1
> 
> If I'm using spin_lock_irqsave() and spin_lock_irqrestore() I do not get
> the same warnings. Therefore I must have some misunderstanding :)


I've seen something like that before, I think it was a bug either in
sparse or in the powerpc platform code. Try updating both the kernel
and sparse and see if it still happens.

	Arnd <><
Arnd Bergmann Nov. 5, 2008, 1:50 p.m. UTC | #12
On Tuesday 04 November 2008, Ira Snyder wrote:
> On Tue, Nov 04, 2008 at 09:23:03PM +0100, Arnd Bergmann wrote:
> > On Tuesday 04 November 2008, Ira Snyder wrote:
> > > I don't really know how to do that. I got a warning here from sparse
> > > telling me something about expensive pointer subtraction. Adding a dummy
> > > 32bit padding variable got rid of the warning, but I didn't change the
> > > driver.
> > 
> > Ok, I see. However, adding the packed attribute makes it more expensive
> > to use.
> > 
> 
> Ok. Is there any way to make sure that the structure compiles to the
> same representation on the host and agent system without using packed?

Only knowledge about the alignment on all the possible architectures ;-)
As a simplified rule, always pad every struct member to the largest
other member in the struct and always use explicitly sized types like
__u8 or __le32.
 
> Hopefully that's a good description. :) It seems to me that both sides
> of the connection need to read the descriptors (to get packet length,
> clean up dirty packets, etc.) and write them (to set packet length, mark
> packets dirty, etc.) I just can't come up with something that is
> local-read / remote-write only.

If I understand your description correctly, the only remote read is
when the host accesses the buffer descriptors to find free space.
Avoiding this read access may improve the latency a bit. In our ring
buffer concept, both host and endpoint allocate a memory buffer that
gets ioremapped into the remote side. Since you always need to read
the descriptors from powerpc, you should probably keep them in powerpc
memory, but you can change the code so that for finding the next
free entry, the host will look in its own memory for the number of the
next entry, and the powerpc side will write that when it consumes a
descriptor to mark it as free.

> > Which side allocates them anyway? Since you use ioread32/iowrite32
> > on the ppc side, it looks like they are on the PCI host, which does
> > not seem to make much sense, because the ppc memory is much closer
> > to the DMA engine?
> > 
> 
> The PowerPC allocates them. They are accessible via PCI BAR1. They live
> in regular RAM on the PowerPC. I can't remember why I used
> ioread32/iowrite32 anymore. I'll try again with in_le32()/out_le32() on
> the PowerPC system, and see what happens.

Actually, if they are in powerpc RAM, you must not neither in_le32 nor
ioread32. Both are only well-defined on I/O devices (local bus or PCI,
respectively). Instead, you should use directly access the buffer using
pointer dereferences, and use rmb()/wmb() to make sure anything you
access is synchronized with the host.

> > Obviously, you want the DMA engine to do the data transfers, but here, you
> > use ioread32 for mmio transfers to the descriptors, which is slow.
> > 
> 
> I didn't know it was slow :) Maybe this is why I had to make the MTU
> very large to get good speed. Using a standard 1500 byte MTU I get
> <10 MB/sec transfer speed. Using a 64K MTU, I get ~45MB/sec transfer
> speed.
> 
> Do I need to do any sort of flushing to make sure that the read has
> actually gone out of cache and into memory? When the host accesses the
> buffer descriptors over PCI, it can only view memory. If a write is
> still in the PowerPC cache, the host will get stale data.

The access over the bus is cache-coherent, unless you are on one of the
more obscure powerpc implementations. This means you do not have a
problem with data still being in cache. However, you need to make
sure that data arrives in the right order. DMA read accesses over PCI
may be reordered, and you need a wmb() between two memory stores if you
want to be sure that the host sees them in the correct order.

> > > Yep, I tried to do this. I couldn't figure out a sane ordering that
> > > would work. I tried to keep the network and uart as seperate as possible
> > > in the code.
> > 
> > I'd suggest splitting the uart code into a separate driver then.
> > 
> 
> How? In Linux we can only have one driver for a certain set of hardware.
> I use the messaging unit to do both network (interrupts and status bits)
> and uart (interrupts and message transfer).
> 
> Both the network and uart _must_ run at the same time. This way I can
> type into the bootloader prompt to start a network transfer, and watch
> it complete.
> 
> Remember, I can't have a real serial console plugged into this board.
> I'll be using this with about 150 boards in 8 separate chassis, which
> makes cabling a nightmare. I'm trying to do as much as possible with the
> PCI backplane.
 
When splitting out the hardware specific parts, I would write a device
driver for the messaging unit that knows about neither the uart nor the
network (or any other high-level protocol). It's a bit more complicated to
load the two high-level drivers in that case, but one clean way to do
it would be to instantiate a new bus-type from the MU driver and have
that driver register devices for itself. Then you can load the high-level
driver through udev or have them built into the kernel.

To get really fancy, you could find a way for the host to announce what
protocols are supported on though the MU. A use case for that, which I
have been thinking about before, would be to allow the host to set up
direct virtual point-to-point networks between two endpoints, not involving
the host at all once the device is up.

	Arnd <><
Ira Snyder Nov. 5, 2008, 7:18 p.m. UTC | #13
On Wed, Nov 05, 2008 at 02:19:52PM +0100, Arnd Bergmann wrote:
> On Tuesday 04 November 2008, Ira Snyder wrote:
> 
> > So, since interrupts are disabled while my interrupt handler is running,
> > I think I should be able to use spin_lock() and spin_unlock(), correct?
> 
> yes.
> 
> > But sparse gives me the following warning:
> > wqt.c:185:9: warning: context imbalance in 'wqt_interrupt': wrong count at exit
> > wqt.c:185:9:    context 'lock': wanted 0, got 1
> > 
> > If I'm using spin_lock_irqsave() and spin_lock_irqrestore() I do not get
> > the same warnings. Therefore I must have some misunderstanding :)
> 
> 
> I've seen something like that before, I think it was a bug either in
> sparse or in the powerpc platform code. Try updating both the kernel
> and sparse and see if it still happens.
> 

Ok, I had the latest sparse, but I've been developing the host driver
against Ubuntu 8.04's kernel, which is based on 2.6.24. Using Linus' git
tree resolved the errors from sparse.

I ended up finding that the dma_mapping_error() API has changed, and so
I updated the driver for that. It should build against 2.6.28-rc now.
I'll post it in a little while.

Ira
Ira Snyder Nov. 5, 2008, 7:32 p.m. UTC | #14
On Wed, Nov 05, 2008 at 02:50:59PM +0100, Arnd Bergmann wrote:
> On Tuesday 04 November 2008, Ira Snyder wrote:
> > On Tue, Nov 04, 2008 at 09:23:03PM +0100, Arnd Bergmann wrote:
> > > On Tuesday 04 November 2008, Ira Snyder wrote:
> > > > I don't really know how to do that. I got a warning here from sparse
> > > > telling me something about expensive pointer subtraction. Adding a dummy
> > > > 32bit padding variable got rid of the warning, but I didn't change the
> > > > driver.
> > > 
> > > Ok, I see. However, adding the packed attribute makes it more expensive
> > > to use.
> > > 
> > 
> > Ok. Is there any way to make sure that the structure compiles to the
> > same representation on the host and agent system without using packed?
> 
> Only knowledge about the alignment on all the possible architectures ;-)
> As a simplified rule, always pad every struct member to the largest
> other member in the struct and always use explicitly sized types like
> __u8 or __le32.
>  

Ok, I tried using:
struct circ_buf_desc {
	__le32 sc;
	__le32 len;
	__le32 addr;
	__le32 padding;
};

The structure came out the same size on both x86 and powerpc, and the
driver still works. I'll put this change in the driver.

> > Hopefully that's a good description. :) It seems to me that both sides
> > of the connection need to read the descriptors (to get packet length,
> > clean up dirty packets, etc.) and write them (to set packet length, mark
> > packets dirty, etc.) I just can't come up with something that is
> > local-read / remote-write only.
> 
> If I understand your description correctly, the only remote read is
> when the host accesses the buffer descriptors to find free space.
> Avoiding this read access may improve the latency a bit. In our ring
> buffer concept, both host and endpoint allocate a memory buffer that
> gets ioremapped into the remote side. Since you always need to read
> the descriptors from powerpc, you should probably keep them in powerpc
> memory, but you can change the code so that for finding the next
> free entry, the host will look in its own memory for the number of the
> next entry, and the powerpc side will write that when it consumes a
> descriptor to mark it as free.
> 

There are a few remote reads (from the host).

1) in hard_start_xmit() to make sure the queue is stopped
2) in wqt_tx_complete() to find the buffer descriptors that have been
   consumed
3) in wqt_rx_napi() to find the buffer descriptors that have been
   dirtied
4) in wqt_rx_napi() to get the actual packet length of a dirty buffer

I /could/ come up with a scheme to use only writes, but it seems much
too complicated for a little performance.

I'll keep it in mind, and change it during performance tuning (if
needed).

> > > Which side allocates them anyway? Since you use ioread32/iowrite32
> > > on the ppc side, it looks like they are on the PCI host, which does
> > > not seem to make much sense, because the ppc memory is much closer
> > > to the DMA engine?
> > > 
> > 
> > The PowerPC allocates them. They are accessible via PCI BAR1. They live
> > in regular RAM on the PowerPC. I can't remember why I used
> > ioread32/iowrite32 anymore. I'll try again with in_le32()/out_le32() on
> > the PowerPC system, and see what happens.
> 
> Actually, if they are in powerpc RAM, you must not neither in_le32 nor
> ioread32. Both are only well-defined on I/O devices (local bus or PCI,
> respectively). Instead, you should use directly access the buffer using
> pointer dereferences, and use rmb()/wmb() to make sure anything you
> access is synchronized with the host.
> 

I've changed it to just do pointer dereferences, and it works just fine.
I added appropriate wmb() (I think...)

> > > Obviously, you want the DMA engine to do the data transfers, but here, you
> > > use ioread32 for mmio transfers to the descriptors, which is slow.
> > > 
> > 
> > I didn't know it was slow :) Maybe this is why I had to make the MTU
> > very large to get good speed. Using a standard 1500 byte MTU I get
> > <10 MB/sec transfer speed. Using a 64K MTU, I get ~45MB/sec transfer
> > speed.
> > 
> > Do I need to do any sort of flushing to make sure that the read has
> > actually gone out of cache and into memory? When the host accesses the
> > buffer descriptors over PCI, it can only view memory. If a write is
> > still in the PowerPC cache, the host will get stale data.
> 
> The access over the bus is cache-coherent, unless you are on one of the
> more obscure powerpc implementations. This means you do not have a
> problem with data still being in cache. However, you need to make
> sure that data arrives in the right order. DMA read accesses over PCI
> may be reordered, and you need a wmb() between two memory stores if you
> want to be sure that the host sees them in the correct order.
> 

Ok.

> > > > Yep, I tried to do this. I couldn't figure out a sane ordering that
> > > > would work. I tried to keep the network and uart as seperate as possible
> > > > in the code.
> > > 
> > > I'd suggest splitting the uart code into a separate driver then.
> > > 
> > 
> > How? In Linux we can only have one driver for a certain set of hardware.
> > I use the messaging unit to do both network (interrupts and status bits)
> > and uart (interrupts and message transfer).
> > 
> > Both the network and uart _must_ run at the same time. This way I can
> > type into the bootloader prompt to start a network transfer, and watch
> > it complete.
> > 
> > Remember, I can't have a real serial console plugged into this board.
> > I'll be using this with about 150 boards in 8 separate chassis, which
> > makes cabling a nightmare. I'm trying to do as much as possible with the
> > PCI backplane.
>  
> When splitting out the hardware specific parts, I would write a device
> driver for the messaging unit that knows about neither the uart nor the
> network (or any other high-level protocol). It's a bit more complicated to
> load the two high-level drivers in that case, but one clean way to do
> it would be to instantiate a new bus-type from the MU driver and have
> that driver register devices for itself. Then you can load the high-level
> driver through udev or have them built into the kernel.
> 
> To get really fancy, you could find a way for the host to announce what
> protocols are supported on though the MU. A use case for that, which I
> have been thinking about before, would be to allow the host to set up
> direct virtual point-to-point networks between two endpoints, not involving
> the host at all once the device is up.
> 

This is getting pretty complicated, especially for someone just getting
started writing drivers. :)

It seems like you'd need a set of functions to access specific
interrupts / message boxes, maybe something like the gpio interface
(where each gpio pin has an integer number).

Both sides have to agree on what each interrupt means, and which mailbox
data transfer happens in. So each driver would still be pretty closely
tied to the hardware.

It seems like too much for someone like me to design. I'll leave it up
to the pro's.

Ira
diff mbox

Patch

diff --git a/arch/powerpc/boot/dts/mpc834x_mds.dts b/arch/powerpc/boot/dts/mpc834x_mds.dts
index c986c54..3bc8975 100644
--- a/arch/powerpc/boot/dts/mpc834x_mds.dts
+++ b/arch/powerpc/boot/dts/mpc834x_mds.dts
@@ -104,6 +104,13 @@ 
 			mode = "cpu";
 		};
 
+		message-unit@8030 {
+			compatible = "fsl,mpc8349-mu";
+			reg = <0x8030 0xd0>;
+			interrupts = <69 0x8>;
+			interrupt-parent = <&ipic>;
+		};
+
 		dma@82a8 {
 			#address-cells = <1>;
 			#size-cells = <1>;
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index f749b40..eef7af7 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2279,6 +2279,40 @@  config UGETH_TX_ON_DEMAND
 	bool "Transmit on Demand support"
 	depends on UCC_GETH
 
+config PCINET_FSL
+	tristate "PCINet Virtual Ethernet over PCI support (Freescale)"
+	depends on MPC834x_MDS && !PCI
+	select DMA_ENGINE
+	select FSL_DMA
+	help
+	  When running as a PCI Agent, this driver will create a virtual
+	  ethernet link running over the PCI bus, allowing simplified
+	  communication with the host system. The host system will need
+	  to use the corresponding driver.
+
+	  If in doubt, say N.
+
+config PCINET_HOST
+	tristate "PCINet Virtual Ethernet over PCI support (Host)"
+	depends on PCI
+	help
+	  This driver will let you communicate with a PCINet client device
+	  using a virtual ethernet link running over the PCI bus. This
+	  allows simplified communication with the client system.
+
+	  This is inteded for use in a system that has a crate full of
+	  computers running Linux, all connected by a PCI backplane.
+
+	  If in doubt, say N.
+
+config PCINET_DISABLE_CHECKSUM
+	bool "Disable packet checksumming"
+	depends on PCINET_FSL || PCINET_HOST
+	default n
+	help
+	  Disable packet checksumming on packets received by the PCINet
+	  driver. This gives a possible speed boost.
+
 config MV643XX_ETH
 	tristate "Marvell Discovery (643XX) and Orion ethernet support"
 	depends on MV64360 || MV64X60 || (PPC_MULTIPLATFORM && PPC32) || PLAT_ORION
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index f19acf8..c6fbafc 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -30,6 +30,9 @@  gianfar_driver-objs := gianfar.o \
 obj-$(CONFIG_UCC_GETH) += ucc_geth_driver.o
 ucc_geth_driver-objs := ucc_geth.o ucc_geth_mii.o ucc_geth_ethtool.o
 
+obj-$(CONFIG_PCINET_FSL) += pcinet_fsl.o
+obj-$(CONFIG_PCINET_HOST) += pcinet_host.o
+
 #
 # link order important here
 #
diff --git a/drivers/net/pcinet.h b/drivers/net/pcinet.h
new file mode 100644
index 0000000..66d2cba
--- /dev/null
+++ b/drivers/net/pcinet.h
@@ -0,0 +1,75 @@ 
+/*
+ * Shared Definitions for the PCINet / PCISerial drivers
+ *
+ * Copyright (c) 2008 Ira W. Snyder <iws@ovro.caltech.edu>
+ *
+ * Heavily inspired by the drivers/net/fs_enet driver
+ *
+ * This file is licensed under the terms of the GNU General Public License
+ * version 2. This program is licensed "as is" without any warranty of any
+ * kind, whether express or implied.
+ */
+
+#ifndef PCINET_H
+#define PCINET_H
+
+#include <linux/kernel.h>
+#include <linux/if_ether.h>
+
+/* Ring and Frame size -- these must match between the drivers */
+#define PH_RING_SIZE	(64)
+#define PH_MAX_FRSIZE	(64 * 1024)
+#define PH_MAX_MTU	(PH_MAX_FRSIZE - ETH_HLEN)
+
+struct circ_buf_desc {
+	__le32 sc;
+	__le32 len;
+	__le32 addr;
+} __attribute__((__packed__));
+typedef struct circ_buf_desc cbd_t;
+
+/* Buffer Descriptor Accessors */
+#define CBDW_SC(_cbd, _sc) iowrite32((_sc), &(_cbd)->sc)
+#define CBDW_LEN(_cbd, _len) iowrite32((_len), &(_cbd)->len)
+#define CBDW_ADDR(_cbd, _addr) iowrite32((_addr), &(_cbd)->addr)
+
+#define CBDR_SC(_cbd) ioread32(&(_cbd)->sc)
+#define CBDR_LEN(_cbd) ioread32(&(_cbd)->len)
+#define CBDR_ADDR(_cbd) ioread32(&(_cbd)->addr)
+
+/* Buffer Descriptor Registers */
+#define PCINET_TXBD_BASE	0x400
+#define PCINET_RXBD_BASE	0x800
+
+/* Buffer Descriptor Status */
+#define BD_MEM_READY		0x1
+#define BD_MEM_DIRTY		0x2
+#define BD_MEM_FREE		0x3
+
+/* IMMR Accessor Helpers */
+#define IMMR_R32(_off) ioread32(priv->immr+(_off))
+#define IMMR_W32(_off, _val) iowrite32((_val), priv->immr+(_off))
+#define IMMR_R32BE(_off) ioread32be(priv->immr+(_off))
+#define IMMR_W32BE(_off, _val) iowrite32be((_val), priv->immr+(_off))
+
+/* Status Register Bits */
+#define PCINET_UART_RX_ENABLED		(1<<0)
+#define PCINET_NET_STATUS_RUNNING	(1<<1)
+#define PCINET_NET_RXINT_OFF		(1<<2)
+#define PCINET_NET_REGISTERS_VALID	(1<<3)
+
+/* Driver State Bits */
+#define NET_STATE_STOPPED	0
+#define NET_STATE_RUNNING	1
+
+/* Doorbell Registers */
+#define UART_RX_READY_DBELL	(1<<0)
+#define UART_TX_EMPTY_DBELL	(1<<1)
+#define NET_RX_PACKET_DBELL	(1<<2)
+#define NET_TX_COMPLETE_DBELL	(1<<3)
+#define NET_START_REQ_DBELL	(1<<4)
+#define NET_START_ACK_DBELL	(1<<5)
+#define NET_STOP_REQ_DBELL	(1<<6)
+#define NET_STOP_ACK_DBELL	(1<<7)
+
+#endif /* PCINET_H */
diff --git a/drivers/net/pcinet_fsl.c b/drivers/net/pcinet_fsl.c
new file mode 100644
index 0000000..b03764c
--- /dev/null
+++ b/drivers/net/pcinet_fsl.c
@@ -0,0 +1,1351 @@ 
+/*
+ * PCINet and PCISerial Driver for Freescale MPC8349EMDS
+ *
+ * Copyright (c) 2008 Ira W. Snyder <iws@ovro.caltech.edu>
+ *
+ * Heavily inspired by the drivers/net/fs_enet driver
+ *
+ * This file is licensed under the terms of the GNU General Public License
+ * version 2. This program is licensed "as is" without any warranty of any
+ * kind, whether express or implied.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/of_platform.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
+#include <linux/interrupt.h>
+#include <linux/irqreturn.h>
+#include <linux/delay.h>
+#include <linux/io.h>
+#include <linux/serial.h>
+#include <linux/serial_core.h>
+#include <linux/etherdevice.h>
+#include <linux/mutex.h>
+#include <linux/dmaengine.h>
+
+#include "pcinet.h"
+#include "pcinet_hw.h"
+
+static const char driver_name[] = "wqt";
+
+static void wqtuart_rx_char(struct uart_port *port, const char ch);
+static void wqtuart_stop_tx(struct uart_port *port);
+
+struct wqt_dev;
+typedef void (*wqt_irqhandler_t)(struct wqt_dev *);
+
+struct wqt_irqhandlers {
+	wqt_irqhandler_t net_start_req_handler;
+	wqt_irqhandler_t net_start_ack_handler;
+	wqt_irqhandler_t net_stop_req_handler;
+	wqt_irqhandler_t net_stop_ack_handler;
+	wqt_irqhandler_t net_rx_packet_handler;
+	wqt_irqhandler_t net_tx_complete_handler;
+	wqt_irqhandler_t uart_rx_ready_handler;
+	wqt_irqhandler_t uart_tx_empty_handler;
+};
+
+struct wqt_dev {
+	/*--------------------------------------------------------------------*/
+	/* OpenFirmware Infrastructure                                        */
+	/*--------------------------------------------------------------------*/
+	struct of_device *op;
+	struct device *dev;
+	int irq;
+	void __iomem *immr;
+
+	struct mutex irq_mutex;
+	int interrupt_count;
+
+	spinlock_t irq_lock;
+	struct wqt_irqhandlers handlers;
+
+	/*--------------------------------------------------------------------*/
+	/* UART Device Infrastructure                                         */
+	/*--------------------------------------------------------------------*/
+	struct uart_port port;
+	bool uart_rx_enabled;
+	bool uart_open;
+
+	struct workqueue_struct *wq;
+	struct work_struct uart_tx_work;
+	wait_queue_head_t uart_tx_wait; /* sleep for uart_tx_ready */
+	bool uart_tx_ready; /* transmitter state */
+
+	/*--------------------------------------------------------------------*/
+	/* Ethernet Device Infrastructure                                     */
+	/*--------------------------------------------------------------------*/
+	struct net_device *ndev;
+	void __iomem *netregs;
+	dma_addr_t netregs_addr;
+
+	/* Outstanding SKB */
+	struct sk_buff *tx_skbs[PH_RING_SIZE];
+
+	/* Circular Buffer Descriptor base */
+	cbd_t __iomem *rx_base;
+	cbd_t __iomem *tx_base;
+
+	/* Current SKB index */
+	cbd_t __iomem *cur_rx;
+	cbd_t __iomem *cur_tx;
+	cbd_t __iomem *dirty_tx;
+	int tx_free;
+
+	struct tasklet_struct tx_complete_tasklet;
+	spinlock_t net_lock;
+
+	struct mutex net_mutex;
+	int net_state;
+	struct work_struct net_start_work;
+	struct work_struct net_stop_work;
+	struct completion net_start_completion;
+	struct completion net_stop_completion;
+	struct napi_struct napi;
+
+	struct dma_client client;
+	struct dma_chan *chan;
+};
+
+/*----------------------------------------------------------------------------*/
+/* Status Register Helper Operations                                          */
+/*----------------------------------------------------------------------------*/
+
+static DEFINE_SPINLOCK(status_lock);
+
+static void wqtstatus_setbit(struct wqt_dev *priv, u32 bit)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&status_lock, flags);
+	IMMR_W32(OMR1_OFFSET, IMMR_R32(OMR1_OFFSET) | bit);
+	spin_unlock_irqrestore(&status_lock, flags);
+}
+
+static void wqtstatus_clrbit(struct wqt_dev *priv, u32 bit)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&status_lock, flags);
+	IMMR_W32(OMR1_OFFSET, IMMR_R32(OMR1_OFFSET) & ~bit);
+	spin_unlock_irqrestore(&status_lock, flags);
+}
+
+static int wqtstatus_remote_testbit(struct wqt_dev *priv, u32 bit)
+{
+	return IMMR_R32(IMR1_OFFSET) & bit;
+}
+
+/*----------------------------------------------------------------------------*/
+/* Message Sending and Processing Operations                                  */
+/*----------------------------------------------------------------------------*/
+
+static irqreturn_t wqt_interrupt(int irq, void *dev_id)
+{
+	struct wqt_dev *priv = dev_id;
+	u32 imisr, idr;
+	unsigned long flags;
+
+	imisr = IMMR_R32(IMISR_OFFSET);
+	idr = IMMR_R32(IDR_OFFSET);
+
+	if (!(imisr & 0x8))
+		return IRQ_NONE;
+
+	/* Clear all of the interrupt sources, we'll handle them next */
+	IMMR_W32(IDR_OFFSET, idr);
+
+	/* Lock over all of the handlers, so they cannot get called when
+	 * the code doesn't expect them to be called */
+	spin_lock_irqsave(&priv->irq_lock, flags);
+
+	if (idr & UART_RX_READY_DBELL)
+		priv->handlers.uart_rx_ready_handler(priv);
+
+	if (idr & UART_TX_EMPTY_DBELL)
+		priv->handlers.uart_tx_empty_handler(priv);
+
+	if (idr & NET_RX_PACKET_DBELL)
+		priv->handlers.net_rx_packet_handler(priv);
+
+	if (idr & NET_TX_COMPLETE_DBELL)
+		priv->handlers.net_tx_complete_handler(priv);
+
+	if (idr & NET_START_REQ_DBELL)
+		priv->handlers.net_start_req_handler(priv);
+
+	if (idr & NET_START_ACK_DBELL)
+		priv->handlers.net_start_ack_handler(priv);
+
+	if (idr & NET_STOP_REQ_DBELL)
+		priv->handlers.net_stop_req_handler(priv);
+
+	if (idr & NET_STOP_ACK_DBELL)
+		priv->handlers.net_stop_ack_handler(priv);
+
+	spin_unlock_irqrestore(&priv->irq_lock, flags);
+
+	return IRQ_HANDLED;
+}
+
+/* Send a character through the mbox when it becomes available
+ * Blocking, must not be called with any spinlocks held */
+static int do_send_message(struct wqt_dev *priv, const char ch)
+{
+	struct uart_port *port = &priv->port;
+	bool tmp;
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->irq_lock, flags);
+	while (priv->uart_tx_ready != true) {
+		spin_unlock_irqrestore(&priv->irq_lock, flags);
+		wait_event_timeout(priv->uart_tx_wait, priv->uart_tx_ready, HZ);
+
+		spin_lock_irqsave(&port->lock, flags);
+		tmp = priv->uart_open;
+		spin_unlock_irqrestore(&port->lock, flags);
+
+		if (!tmp)
+			return -EIO;
+
+		spin_lock_irqsave(&priv->irq_lock, flags);
+	}
+
+	/* Now the transmitter is free, send the message */
+	IMMR_W32(OMR0_OFFSET, ch);
+	IMMR_W32(ODR_OFFSET, UART_RX_READY_DBELL);
+
+	/* Mark the transmitter busy */
+	priv->uart_tx_ready = false;
+	spin_unlock_irqrestore(&priv->irq_lock, flags);
+	return 0;
+}
+
+/* Grab a character out of the uart tx buffer and send it */
+static void uart_tx_work_fn(struct work_struct *work)
+{
+	struct wqt_dev *priv = container_of(work, struct wqt_dev, uart_tx_work);
+	struct uart_port *port = &priv->port;
+	struct circ_buf *xmit = &port->info->xmit;
+	char ch;
+	unsigned long flags;
+
+	spin_lock_irqsave(&port->lock, flags);
+	while (true) {
+
+		/* Check for XON/XOFF (high priority) */
+		if (port->x_char) {
+			ch = port->x_char;
+			port->x_char = 0;
+			spin_unlock_irqrestore(&port->lock, flags);
+
+			if (do_send_message(priv, ch))
+				return;
+
+			spin_lock_irqsave(&port->lock, flags);
+			continue;
+		}
+
+		/* If we're out of chars or the port is stopped, we're done */
+		if (uart_circ_empty(xmit) || uart_tx_stopped(port)) {
+			wqtuart_stop_tx(port);
+			break;
+		}
+
+		/* Grab the next char out of the buffer and send it */
+		ch = xmit->buf[xmit->tail];
+		xmit->tail = (xmit->tail + 1) & (UART_XMIT_SIZE - 1);
+		spin_unlock_irqrestore(&port->lock, flags);
+
+		if (do_send_message(priv, ch))
+			return;
+
+		spin_lock_irqsave(&port->lock, flags);
+	}
+
+	if (uart_circ_chars_pending(xmit) < WAKEUP_CHARS)
+		uart_write_wakeup(port);
+
+	if (uart_circ_empty(xmit))
+		wqtuart_stop_tx(port);
+
+	spin_unlock_irqrestore(&port->lock, flags);
+}
+
+/*----------------------------------------------------------------------------*/
+/* Interrupt Handlers                                                         */
+/*----------------------------------------------------------------------------*/
+
+/* NOTE: All handlers are called with priv->irq_lock held */
+
+static void empty_handler(struct wqt_dev *priv)
+{
+	/* Intentionally left empty */
+}
+
+static void net_start_req_handler(struct wqt_dev *priv)
+{
+	schedule_work(&priv->net_start_work);
+}
+
+static void net_start_ack_handler(struct wqt_dev *priv)
+{
+	complete(&priv->net_start_completion);
+}
+
+static void net_stop_req_handler(struct wqt_dev *priv)
+{
+	schedule_work(&priv->net_stop_work);
+}
+
+static void net_stop_ack_handler(struct wqt_dev *priv)
+{
+	complete(&priv->net_stop_completion);
+}
+
+static void net_tx_complete_handler(struct wqt_dev *priv)
+{
+	tasklet_schedule(&priv->tx_complete_tasklet);
+}
+
+static void net_rx_packet_handler(struct wqt_dev *priv)
+{
+	wqtstatus_setbit(priv, PCINET_NET_RXINT_OFF);
+	netif_rx_schedule(priv->ndev, &priv->napi);
+}
+
+static void uart_rx_ready_handler(struct wqt_dev *priv)
+{
+	wqtuart_rx_char(&priv->port, IMMR_R32(IMR0_OFFSET) & 0xff);
+	IMMR_W32(ODR_OFFSET, UART_TX_EMPTY_DBELL);
+}
+
+static void uart_tx_empty_handler(struct wqt_dev *priv)
+{
+	priv->uart_tx_ready = true;
+	wake_up(&priv->uart_tx_wait);
+}
+
+/*----------------------------------------------------------------------------*/
+/* Interrupt Request / Free Helpers                                           */
+/*----------------------------------------------------------------------------*/
+
+static void do_enable_net_startstop_handlers(struct wqt_dev *priv)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->irq_lock, flags);
+	priv->handlers.net_start_req_handler = net_start_req_handler;
+	priv->handlers.net_start_ack_handler = net_start_ack_handler;
+	priv->handlers.net_stop_req_handler = net_stop_req_handler;
+	priv->handlers.net_stop_ack_handler = net_stop_ack_handler;
+	spin_unlock_irqrestore(&priv->irq_lock, flags);
+
+	wqtstatus_setbit(priv, PCINET_NET_STATUS_RUNNING);
+}
+
+static void do_disable_net_startstop_handlers(struct wqt_dev *priv)
+{
+	unsigned long flags;
+
+	wqtstatus_clrbit(priv, PCINET_NET_STATUS_RUNNING);
+
+	spin_lock_irqsave(&priv->irq_lock, flags);
+	priv->handlers.net_start_req_handler = empty_handler;
+	priv->handlers.net_start_ack_handler = empty_handler;
+	priv->handlers.net_stop_req_handler = empty_handler;
+	priv->handlers.net_stop_ack_handler = empty_handler;
+	spin_unlock_irqrestore(&priv->irq_lock, flags);
+}
+
+static void do_enable_net_rxtx_handlers(struct wqt_dev *priv)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->irq_lock, flags);
+	priv->handlers.net_rx_packet_handler = net_rx_packet_handler;
+	priv->handlers.net_tx_complete_handler = net_tx_complete_handler;
+	spin_unlock_irqrestore(&priv->irq_lock, flags);
+}
+
+static void do_disable_net_rxtx_handlers(struct wqt_dev *priv)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->irq_lock, flags);
+	priv->handlers.net_rx_packet_handler = empty_handler;
+	priv->handlers.net_tx_complete_handler = empty_handler;
+	spin_unlock_irqrestore(&priv->irq_lock, flags);
+}
+
+static void do_enable_uart_handlers(struct wqt_dev *priv)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->irq_lock, flags);
+	priv->handlers.uart_rx_ready_handler = uart_rx_ready_handler;
+	priv->handlers.uart_tx_empty_handler = uart_tx_empty_handler;
+	spin_unlock_irqrestore(&priv->irq_lock, flags);
+}
+
+static void do_disable_uart_handlers(struct wqt_dev *priv)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->irq_lock, flags);
+	priv->handlers.uart_rx_ready_handler = empty_handler;
+	priv->handlers.uart_tx_empty_handler = empty_handler;
+	spin_unlock_irqrestore(&priv->irq_lock, flags);
+}
+
+static int wqt_request_irq(struct wqt_dev *priv)
+{
+	int ret = 0;
+
+	mutex_lock(&priv->irq_mutex);
+
+	if (priv->interrupt_count > 0)
+		goto out_unlock;
+
+	/* Force all handlers to be disabled before attaching the handler */
+	do_disable_net_startstop_handlers(priv);
+	do_disable_net_rxtx_handlers(priv);
+	do_disable_uart_handlers(priv);
+
+	ret = request_irq(priv->irq,
+			  wqt_interrupt,
+			  IRQF_SHARED,
+			  priv->ndev->name,
+			  priv);
+
+out_unlock:
+	priv->interrupt_count++;
+	mutex_unlock(&priv->irq_mutex);
+
+	return ret;
+}
+
+static void wqt_free_irq(struct wqt_dev *priv)
+{
+	mutex_lock(&priv->irq_mutex);
+	priv->interrupt_count--;
+
+	if (priv->interrupt_count > 0)
+		goto out_unlock;
+
+	free_irq(priv->irq, priv);
+
+out_unlock:
+	mutex_unlock(&priv->irq_mutex);
+}
+
+/*----------------------------------------------------------------------------*/
+/* Network Startup and Shutdown Helpers                                       */
+/*----------------------------------------------------------------------------*/
+
+/* NOTE: All helper functions prefixed with "do" must be called only from
+ * process context, with priv->net_mutex held. They are expected to sleep */
+
+static void do_net_start_queues(struct wqt_dev *priv)
+{
+	if (priv->net_state == NET_STATE_RUNNING)
+		return;
+
+	dev_dbg(priv->dev, "resetting buffer positions\n");
+	priv->cur_rx = priv->rx_base;
+	priv->cur_tx = priv->tx_base;
+	priv->dirty_tx = priv->tx_base;
+	priv->tx_free = PH_RING_SIZE;
+
+	dev_dbg(priv->dev, "enabling NAPI queue\n");
+	napi_enable(&priv->napi);
+
+	dev_dbg(priv->dev, "enabling tx_complete() tasklet\n");
+	tasklet_enable(&priv->tx_complete_tasklet);
+
+	dev_dbg(priv->dev, "enabling TX queue\n");
+	netif_start_queue(priv->ndev);
+
+	dev_dbg(priv->dev, "carrier on!\n");
+	netif_carrier_on(priv->ndev);
+
+	/* Enable the RX_PACKET and TX_COMPLETE interrupt handlers */
+	do_enable_net_rxtx_handlers(priv);
+
+	priv->net_state = NET_STATE_RUNNING;
+}
+
+static void do_net_stop_queues(struct wqt_dev *priv)
+{
+	if (priv->net_state == NET_STATE_STOPPED)
+		return;
+
+	/* Disable the RX_PACKET and TX_COMPLETE interrupt handlers */
+	do_disable_net_rxtx_handlers(priv);
+
+	dev_dbg(priv->dev, "disabling NAPI queue\n");
+	napi_disable(&priv->napi);
+
+	dev_dbg(priv->dev, "disabling tx_complete() tasklet\n");
+	tasklet_disable(&priv->tx_complete_tasklet);
+
+	dev_dbg(priv->dev, "disabling TX queue\n");
+	netif_tx_disable(priv->ndev);
+
+	dev_dbg(priv->dev, "carrier off!\n");
+	netif_carrier_off(priv->ndev);
+
+	priv->net_state = NET_STATE_STOPPED;
+}
+
+/* Called when we get a request to start our queues and acknowledge */
+static void wqtnet_start_work_fn(struct work_struct *work)
+{
+	struct wqt_dev *priv = container_of(work, struct wqt_dev,
+					    net_start_work);
+
+	mutex_lock(&priv->net_mutex);
+
+	do_net_start_queues(priv);
+	IMMR_W32(ODR_OFFSET, NET_START_ACK_DBELL);
+
+	mutex_unlock(&priv->net_mutex);
+}
+
+/* Called when we get a request to stop our queues and acknowledge */
+static void wqtnet_stop_work_fn(struct work_struct *work)
+{
+	struct wqt_dev *priv = container_of(work, struct wqt_dev,
+					    net_stop_work);
+
+	mutex_lock(&priv->net_mutex);
+
+	do_net_stop_queues(priv);
+	IMMR_W32(ODR_OFFSET, NET_STOP_ACK_DBELL);
+
+	mutex_unlock(&priv->net_mutex);
+}
+
+/*----------------------------------------------------------------------------*/
+/* DMA Operation Helpers                                                      */
+/*----------------------------------------------------------------------------*/
+
+/* Setup a static 1GB window starting at PCI address 0x0
+ *
+ * This means that all DMA must be within the first 1GB of the other side's
+ * memory, which shouldn't be a problem
+ */
+static int wqtdma_setup_outbound_window(struct wqt_dev *priv)
+{
+	IMMR_W32BE(LAWAR0_OFFSET, LAWAR0_ENABLE | 0x1d);
+	IMMR_W32BE(POCMR0_OFFSET, POCMR0_ENABLE | 0xc0000);
+	IMMR_W32BE(POTAR0_OFFSET, 0x0);
+
+	return 0;
+}
+
+static enum dma_state_client dmatest_event(struct dma_client *client,
+					   struct dma_chan *chan,
+					   enum dma_state state)
+{
+	struct wqt_dev *priv = container_of(client, struct wqt_dev, client);
+	enum dma_state_client ack = DMA_NAK;
+
+	switch (state) {
+	case DMA_RESOURCE_AVAILABLE:
+		if (chan == priv->chan)
+			ack = DMA_DUP;
+		else if (priv->chan)
+			ack = DMA_NAK;
+		else {
+			priv->chan = chan;
+			ack = DMA_ACK;
+		}
+
+		break;
+
+	case DMA_RESOURCE_REMOVED:
+		priv->chan = NULL;
+		ack = DMA_ACK;
+		break;
+
+	default:
+		dev_dbg(priv->dev, "unhandled DMA event %u (%s)\n",
+				state, chan->dev.bus_id);
+		break;
+	}
+
+	return ack;
+}
+
+static dma_cookie_t dma_async_memcpy_raw_to_buf(struct dma_chan *chan,
+						void *dest,
+						dma_addr_t src,
+						size_t len)
+{
+	struct dma_device *dev = chan->device;
+	struct dma_async_tx_descriptor *tx;
+	dma_addr_t dma_dest, dma_src;
+	dma_cookie_t cookie;
+	int cpu;
+
+	dma_src = src;
+	dma_dest = dma_map_single(dev->dev, dest, len, DMA_FROM_DEVICE);
+	tx = dev->device_prep_dma_memcpy(chan, dma_dest, dma_src, len,
+					 DMA_CTRL_ACK);
+
+	if (!tx) {
+		dma_unmap_single(dev->dev, dma_dest, len, DMA_FROM_DEVICE);
+		return -ENOMEM;
+	}
+
+	tx->callback = NULL;
+	cookie = tx->tx_submit(tx);
+
+	cpu = get_cpu();
+	per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
+	per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+	put_cpu();
+
+	return cookie;
+}
+
+static dma_cookie_t dma_async_memcpy_buf_to_raw(struct dma_chan *chan,
+						dma_addr_t dest,
+						void *src,
+						size_t len)
+{
+	struct dma_device *dev = chan->device;
+	struct dma_async_tx_descriptor *tx;
+	dma_addr_t dma_dest, dma_src;
+	dma_cookie_t cookie;
+	int cpu;
+
+	dma_src = dma_map_single(dev->dev, src, len, DMA_TO_DEVICE);
+	dma_dest = dest;
+	tx = dev->device_prep_dma_memcpy(chan, dma_dest, dma_src, len,
+					 DMA_CTRL_ACK);
+
+	if (!tx) {
+		dma_unmap_single(dev->dev, dma_src, len, DMA_TO_DEVICE);
+		return -ENOMEM;
+	}
+
+	tx->callback = NULL;
+	cookie = tx->tx_submit(tx);
+
+	cpu = get_cpu();
+	per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
+	per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+	put_cpu();
+
+	return cookie;
+}
+
+/*----------------------------------------------------------------------------*/
+/* Network Device Operations                                                  */
+/*----------------------------------------------------------------------------*/
+
+static int wqt_open(struct net_device *dev)
+{
+	struct wqt_dev *priv = netdev_priv(dev);
+	int ret;
+
+	/* Pretend the cable is unplugged until we are up and running */
+	netif_carrier_off(dev);
+
+	mutex_lock(&priv->net_mutex);
+
+	ret = wqt_request_irq(priv);
+
+	if (ret)
+		goto out_unlock;
+
+	/* Enable only the network start/stop interrupts */
+	do_enable_net_startstop_handlers(priv);
+
+	/* Check if the other side is running, if not, it will start us.
+	 * Without the interrupt handler installed, there's no way it can
+	 * respond to us anyway */
+	if (!wqtstatus_remote_testbit(priv, PCINET_NET_STATUS_RUNNING)) {
+		ret = 0;
+		goto out_unlock;
+	}
+
+	IMMR_W32(ODR_OFFSET, NET_START_REQ_DBELL);
+	ret = wait_for_completion_timeout(&priv->net_start_completion, 5*HZ);
+
+	if (!ret) {
+		/* Our start request timed out, therefore, the other
+		 * side will start us when it comes back up */
+		dev_dbg(priv->dev, "start timed out\n");
+	} else {
+		do_net_start_queues(priv);
+		ret = 0;
+	}
+
+out_unlock:
+	mutex_unlock(&priv->net_mutex);
+	return ret;
+}
+
+static int wqt_stop(struct net_device *dev)
+{
+	struct wqt_dev *priv = netdev_priv(dev);
+	int ret;
+
+	mutex_lock(&priv->net_mutex);
+
+	do_net_stop_queues(priv);
+
+	IMMR_W32(ODR_OFFSET, NET_STOP_REQ_DBELL);
+	ret = wait_for_completion_timeout(&priv->net_stop_completion, 5*HZ);
+
+	if (!ret)
+		dev_warn(priv->dev, "other side did not stop on time!\n");
+	else
+		ret = 0;
+
+	do_disable_net_startstop_handlers(priv);
+	wqt_free_irq(priv);
+
+	mutex_unlock(&priv->net_mutex);
+	return 0;
+}
+
+static int wqt_change_mtu(struct net_device *dev, int new_mtu)
+{
+	if ((new_mtu < 68) || (new_mtu > PH_MAX_MTU))
+		return -EINVAL;
+
+	dev->mtu = new_mtu;
+	return 0;
+}
+
+static int wqt_hard_start_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct wqt_dev *priv = netdev_priv(dev);
+	dma_cookie_t cookie;
+	enum dma_status status;
+	cbd_t __iomem *bdp;
+	int dirty_idx;
+
+	spin_lock_bh(&priv->net_lock);
+
+	bdp = priv->cur_tx;
+	dirty_idx = bdp - priv->tx_base;
+
+	/* This should not happen, the queue should be stopped */
+	if (priv->tx_free == 0 || CBDR_SC(bdp) != BD_MEM_READY) {
+		netif_stop_queue(dev);
+		spin_unlock_bh(&priv->net_lock);
+		return NETDEV_TX_BUSY;
+	}
+
+	cookie = dma_async_memcpy_buf_to_raw(priv->chan,
+					     (dma_addr_t)(0x80000000 + CBDR_ADDR(bdp)),
+					     skb->data,
+					     skb->len);
+
+	if (dma_submit_error(cookie)) {
+		dev_warn(priv->dev, "DMA submit error\n");
+		spin_unlock_bh(&priv->net_lock);
+		return -ENOMEM;
+	}
+
+	status = dma_sync_wait(priv->chan, cookie);
+
+	if (status == DMA_ERROR) {
+		dev_warn(priv->dev, "DMA error\n");
+		spin_unlock_bh(&priv->net_lock);
+		return -EIO;
+	}
+
+	CBDW_LEN(bdp, skb->len);
+	CBDW_SC(bdp, BD_MEM_DIRTY);
+
+	if (dirty_idx == PH_RING_SIZE - 1)
+		bdp = priv->tx_base;
+	else
+		bdp++;
+
+	priv->tx_skbs[dirty_idx] = skb;
+	priv->cur_tx = bdp;
+	priv->tx_free--;
+	dev->trans_start = jiffies;
+
+	if (priv->tx_free == 0)
+		netif_stop_queue(dev);
+
+	if (!wqtstatus_remote_testbit(priv, PCINET_NET_RXINT_OFF))
+		IMMR_W32(ODR_OFFSET, NET_RX_PACKET_DBELL);
+
+	spin_unlock_bh(&priv->net_lock);
+	return NETDEV_TX_OK;
+}
+
+static void wqt_tx_timeout(struct net_device *dev)
+{
+	struct wqt_dev *priv = netdev_priv(dev);
+
+	dev->stats.tx_errors++;
+	IMMR_W32(ODR_OFFSET, NET_RX_PACKET_DBELL);
+}
+
+static void wqt_tx_complete(unsigned long data)
+{
+	struct net_device *dev = (struct net_device *)data;
+	struct wqt_dev *priv = netdev_priv(dev);
+	struct sk_buff *skb;
+	cbd_t __iomem *bdp;
+	int do_wake, dirty_idx;
+
+	spin_lock_bh(&priv->net_lock);
+
+	bdp = priv->dirty_tx;
+	do_wake = 0;
+
+	while (CBDR_SC(bdp) == BD_MEM_FREE) {
+		dirty_idx = bdp - priv->tx_base;
+
+		skb = priv->tx_skbs[dirty_idx];
+
+		BUG_ON(skb == NULL);
+
+		dev->stats.tx_bytes += skb->len;
+		dev->stats.tx_packets++;
+
+		dev_kfree_skb_irq(skb);
+
+		priv->tx_skbs[dirty_idx] = NULL;
+
+		/* Mark the BDP as ready */
+		CBDW_SC(bdp, BD_MEM_READY);
+
+		/* Update the bdp */
+		if (dirty_idx == PH_RING_SIZE - 1)
+			bdp = priv->tx_base;
+		else
+			bdp++;
+
+		if (!priv->tx_free++)
+			do_wake = 1;
+	}
+
+	priv->dirty_tx = bdp;
+
+	spin_unlock_bh(&priv->net_lock);
+
+	if (do_wake)
+		netif_wake_queue(dev);
+}
+
+static int wqt_rx_napi(struct napi_struct *napi, int budget)
+{
+	struct wqt_dev *priv = container_of(napi, struct wqt_dev, napi);
+	struct net_device *dev = priv->ndev;
+	int received = 0;
+	struct sk_buff *skb;
+	dma_addr_t remote_addr;
+	dma_cookie_t cookie;
+	enum dma_status status;
+	int pkt_len, dirty_idx;
+	cbd_t __iomem *bdp;
+
+	bdp = priv->cur_rx;
+
+	while (CBDR_SC(bdp) == BD_MEM_DIRTY) {
+		dirty_idx = bdp - priv->rx_base;
+
+		pkt_len = CBDR_LEN(bdp);
+		remote_addr = CBDR_ADDR(bdp);
+
+		/* Allocate a packet for the data */
+		skb = dev_alloc_skb(pkt_len + NET_IP_ALIGN);
+
+		if (skb == NULL) {
+			dev->stats.rx_dropped++;
+			goto out_err;
+		}
+
+		skb_reserve(skb, NET_IP_ALIGN);
+
+		cookie = dma_async_memcpy_raw_to_buf(priv->chan,
+						     skb->data,
+						     (dma_addr_t)(0x80000000 + remote_addr),
+						     pkt_len);
+
+		if (dma_submit_error(cookie)) {
+			dev_warn(priv->dev, "DMA submit error\n");
+			dev_kfree_skb_irq(skb);
+			dev->stats.rx_dropped++;
+			goto out_err;
+		}
+
+		status = dma_sync_wait(priv->chan, cookie);
+
+		if (status == DMA_ERROR) {
+			dev_warn(priv->dev, "DMA error\n");
+			dev_kfree_skb_irq(skb);
+			dev->stats.rx_dropped++;
+			goto out_err;
+		}
+
+		/* Push the packet into the network stack */
+		skb_put(skb, pkt_len);
+		skb->protocol = eth_type_trans(skb, dev);
+#ifdef CONFIG_PCINET_DISABLE_CHECKSUM
+		skb->ip_summed = CHECKSUM_UNNECESSARY;
+#else
+		skb->ip_summed = CHECKSUM_NONE;
+#endif
+		netif_receive_skb(skb);
+		received++;
+		dev->stats.rx_bytes += pkt_len;
+		dev->stats.rx_packets++;
+
+out_err:
+		CBDW_SC(bdp, BD_MEM_FREE);
+
+		if (dirty_idx == PH_RING_SIZE - 1)
+			bdp = priv->rx_base;
+		else
+			bdp++;
+
+		if (received >= budget)
+			break;
+	}
+
+	priv->cur_rx = bdp;
+
+	/* We have processed all packets that the adapter had, but it
+	 * was less than our budget, stop polling */
+	if (received < budget) {
+		netif_rx_complete(dev, napi);
+		wqtstatus_clrbit(priv, PCINET_NET_RXINT_OFF);
+	}
+
+	IMMR_W32(ODR_OFFSET, NET_TX_COMPLETE_DBELL);
+
+	return received;
+}
+
+/*----------------------------------------------------------------------------*/
+/* UART Device Operations                                                     */
+/*----------------------------------------------------------------------------*/
+
+static unsigned int wqtuart_tx_empty(struct uart_port *port)
+{
+	return TIOCSER_TEMT;
+}
+
+static void wqtuart_set_mctrl(struct uart_port *port, unsigned int mctrl)
+{
+}
+
+static unsigned int wqtuart_get_mctrl(struct uart_port *port)
+{
+	return TIOCM_CAR | TIOCM_DSR | TIOCM_CTS;
+}
+
+static void wqtuart_stop_tx(struct uart_port *port)
+{
+}
+
+static void wqtuart_start_tx(struct uart_port *port)
+{
+	struct wqt_dev *priv = container_of(port, struct wqt_dev, port);
+
+	queue_work(priv->wq, &priv->uart_tx_work);
+}
+
+static void wqtuart_stop_rx(struct uart_port *port)
+{
+	struct wqt_dev *priv = container_of(port, struct wqt_dev, port);
+
+	do_disable_uart_handlers(priv);
+	priv->uart_rx_enabled = false;
+	wqtstatus_clrbit(priv, PCINET_UART_RX_ENABLED);
+}
+
+static void wqtuart_enable_ms(struct uart_port *port)
+{
+}
+
+static void wqtuart_break_ctl(struct uart_port *port, int break_state)
+{
+}
+
+static int wqtuart_startup(struct uart_port *port)
+{
+	struct wqt_dev *priv = container_of(port, struct wqt_dev, port);
+	int ret;
+
+	ret = wqt_request_irq(priv);
+
+	if (ret)
+		return ret;
+
+	do_enable_uart_handlers(priv);
+
+	/* Mark the transmitter and receiver ready */
+	priv->uart_tx_ready = true;
+	priv->uart_rx_enabled = true;
+	wqtstatus_setbit(priv, PCINET_UART_RX_ENABLED);
+
+	/* Let the other side know that we are ready to receive chars now */
+	IMMR_W32(ODR_OFFSET, UART_TX_EMPTY_DBELL);
+	priv->uart_open = true;
+	return 0;
+}
+
+static void wqtuart_shutdown(struct uart_port *port)
+{
+	struct wqt_dev *priv = container_of(port, struct wqt_dev, port);
+
+	wqt_free_irq(priv);
+
+	/* Make sure the uart_tx_work_fn() exits cleanly */
+	priv->uart_open = false;
+	wake_up(&priv->uart_tx_wait);
+}
+
+static void wqtuart_set_termios(struct uart_port *port,
+			       struct ktermios *termios,
+			       struct ktermios *old)
+{
+}
+
+static const char *wqtuart_type(struct uart_port *port)
+{
+	return "WQTUART";
+}
+
+static int wqtuart_request_port(struct uart_port *port)
+{
+	return 0;
+}
+
+static void wqtuart_config_port(struct uart_port *port, int flags)
+{
+}
+
+static void wqtuart_release_port(struct uart_port *port)
+{
+}
+
+static int wqtuart_verify_port(struct uart_port *port,
+			      struct serial_struct *ser)
+{
+	return 0;
+}
+
+static void wqtuart_rx_char(struct uart_port *port, const char ch)
+{
+	struct wqt_dev *priv = container_of(port, struct wqt_dev, port);
+	struct tty_struct *tty;
+	unsigned long flags;
+
+	spin_lock_irqsave(&port->lock, flags);
+
+	if (priv->uart_rx_enabled) {
+		tty = port->info->port.tty;
+		tty_insert_flip_char(tty, ch, TTY_NORMAL);
+		tty_flip_buffer_push(tty);
+	}
+
+	spin_unlock_irqrestore(&port->lock, flags);
+}
+
+static struct uart_ops wqtuart_ops = {
+	.tx_empty	= wqtuart_tx_empty,
+	.set_mctrl	= wqtuart_set_mctrl,
+	.get_mctrl	= wqtuart_get_mctrl,
+	.stop_tx	= wqtuart_stop_tx,
+	.start_tx	= wqtuart_start_tx,
+	.stop_rx	= wqtuart_stop_rx,
+	.enable_ms	= wqtuart_enable_ms,
+	.break_ctl	= wqtuart_break_ctl,
+	.startup	= wqtuart_startup,
+	.shutdown	= wqtuart_shutdown,
+	.set_termios	= wqtuart_set_termios,
+	.type		= wqtuart_type,
+	.release_port	= wqtuart_release_port,
+	.request_port	= wqtuart_request_port,
+	.config_port	= wqtuart_config_port,
+	.verify_port	= wqtuart_verify_port,
+};
+
+static struct uart_driver wqtuart_driver = {
+	.owner		= THIS_MODULE,
+	.driver_name	= driver_name,
+	.dev_name	= "ttyPCI",
+	.major		= 240,
+	.minor		= 0,
+	.nr		= 1,
+};
+
+/*----------------------------------------------------------------------------*/
+/* Network Registers                                                          */
+/*----------------------------------------------------------------------------*/
+
+static void wqt_free_netregs(struct wqt_dev *priv)
+{
+	BUG_ON(priv->netregs == NULL);
+	BUG_ON(priv->netregs_addr == 0x0);
+
+	dma_free_coherent(priv->dev,
+			  PAGE_SIZE,
+			  priv->netregs,
+			  priv->netregs_addr);
+
+	priv->netregs = NULL;
+	priv->netregs_addr = 0x0;
+}
+
+static int wqt_init_netregs(struct wqt_dev *priv)
+{
+	u32 val;
+
+	BUG_ON(priv->netregs != NULL);
+	BUG_ON(priv->netregs_addr != 0x0);
+
+	/* Check the PCI Inbound Window Attributes Register 0 for a 4k window
+	 * This is PCI BAR1, and will be used as network device registers */
+	val = IMMR_R32BE(PIWAR0_OFFSET);
+	val = val & (PIWAR0_ENABLED | PIWAR0_IWS_4K);
+
+	if (val != (PIWAR0_ENABLED | PIWAR0_IWS_4K)) {
+		dev_dbg(priv->dev, "PIWAR0 set up incorrectly\n");
+		return -ENODEV;
+	}
+
+	priv->netregs = dma_alloc_coherent(priv->dev,
+					   PAGE_SIZE,
+					   &priv->netregs_addr,
+					   GFP_KERNEL);
+
+	if (!priv->netregs) {
+		dev_dbg(priv->dev, "Unable to allocate netregs\n");
+		return -ENOMEM;
+	}
+
+	/* Write the page address into the address register */
+	IMMR_W32BE(PITAR0_OFFSET, priv->netregs_addr >> 12);
+	return 0;
+}
+
+/*----------------------------------------------------------------------------*/
+/* OpenFirmware Device Subsystem                                              */
+/*----------------------------------------------------------------------------*/
+
+static int wqt_probe(struct of_device *op, const struct of_device_id *match)
+{
+	struct net_device *ndev;
+	struct wqt_dev *priv;
+	int ret;
+
+	ndev = alloc_etherdev(sizeof(*priv));
+
+	if (!ndev) {
+		ret = -ENOMEM;
+		goto out_alloc_ndev;
+	}
+
+	dev_set_drvdata(&op->dev, ndev);
+	priv = netdev_priv(ndev);
+	priv->op = op;
+	priv->dev = &op->dev;
+	priv->ndev = ndev;
+
+	spin_lock_init(&priv->irq_lock);
+	mutex_init(&priv->irq_mutex);
+
+	/* Hardware Initialization */
+	priv->irq = irq_of_parse_and_map(op->node, 0);
+	priv->immr = ioremap(0xe0000000, 0x100000);
+
+	if (!priv->immr) {
+		ret = -ENOMEM;
+		goto out_ioremap_immr;
+	}
+
+	ret = wqt_init_netregs(priv);
+
+	if (ret)
+		goto out_init_netregs;
+
+	/* NOTE: Yes, this is correct. Everything was written as if this
+	 * NOTE: side *is* a network card. So the place the card is
+	 * NOTE: receiving from is the other side's TX buffers */
+	priv->rx_base = priv->netregs + PCINET_TXBD_BASE;
+	priv->tx_base = priv->netregs + PCINET_RXBD_BASE;
+	wqtstatus_setbit(priv, PCINET_NET_REGISTERS_VALID);
+
+	/* DMA Client */
+	wqtdma_setup_outbound_window(priv);
+	priv->client.event_callback = dmatest_event;
+	dma_cap_set(DMA_MEMCPY, priv->client.cap_mask);
+	dma_async_client_register(&priv->client);
+	dma_async_client_chan_request(&priv->client);
+
+	/* Initialize private data */
+	priv->wq = create_singlethread_workqueue(driver_name);
+
+	if (!priv->wq) {
+		ret = -ENOMEM;
+		goto out_create_workqueue;
+	}
+
+	INIT_WORK(&priv->uart_tx_work, uart_tx_work_fn);
+	init_waitqueue_head(&priv->uart_tx_wait);
+	priv->uart_tx_ready = true;
+
+	tasklet_init(&priv->tx_complete_tasklet, wqt_tx_complete,
+		     (unsigned long)ndev);
+	tasklet_disable(&priv->tx_complete_tasklet);
+	spin_lock_init(&priv->net_lock);
+
+	mutex_init(&priv->net_mutex);
+	priv->net_state = NET_STATE_STOPPED;
+	INIT_WORK(&priv->net_start_work, wqtnet_start_work_fn);
+	INIT_WORK(&priv->net_stop_work, wqtnet_stop_work_fn);
+	init_completion(&priv->net_start_completion);
+	init_completion(&priv->net_stop_completion);
+
+	/* Mask all of the MBOX interrupts */
+	IMMR_W32(IMIMR_OFFSET, 0x1 | 0x2);
+
+	/* Network Device */
+	random_ether_addr(ndev->dev_addr);
+
+	ndev->open              = wqt_open;
+	ndev->stop              = wqt_stop;
+	ndev->change_mtu        = wqt_change_mtu;
+	ndev->hard_start_xmit   = wqt_hard_start_xmit;
+	ndev->tx_timeout        = wqt_tx_timeout;
+	ndev->watchdog_timeo    = HZ/4;
+	ndev->flags            &= ~IFF_MULTICAST;  /* No multicast support */
+#ifdef CONFIG_PCINET_DISABLE_CHECKSUM
+	ndev->features         |= NETIF_F_NO_CSUM; /* No checksum needed */
+#endif
+	ndev->mtu               = PH_MAX_MTU;
+	netif_napi_add(ndev, &priv->napi, wqt_rx_napi, PH_RING_SIZE);
+
+	ret = register_netdev(ndev);
+
+	if (ret)
+		goto out_register_netdev;
+
+	/* UART Device */
+	priv->port.ops = &wqtuart_ops;
+	priv->port.type = PORT_16550A;
+	priv->port.dev = &op->dev;
+	priv->port.line = 0;
+	spin_lock_init(&priv->port.lock);
+
+	ret = uart_add_one_port(&wqtuart_driver, &priv->port);
+
+	if (ret)
+		goto out_add_uart_port;
+
+	dev_info(priv->dev, "using ethernet device %s\n", ndev->name);
+	dev_info(priv->dev, "using serial device %s%d\n",
+			wqtuart_driver.dev_name, priv->port.line);
+	return 0;
+
+out_add_uart_port:
+	unregister_netdev(ndev);
+out_register_netdev:
+	destroy_workqueue(priv->wq);
+out_create_workqueue:
+	wqt_free_netregs(priv);
+out_init_netregs:
+	iounmap(priv->immr);
+out_ioremap_immr:
+	free_netdev(ndev);
+out_alloc_ndev:
+	return ret;
+}
+
+static int wqt_remove(struct of_device *op)
+{
+	struct net_device *ndev = dev_get_drvdata(&op->dev);
+	struct wqt_dev *priv = netdev_priv(ndev);
+
+	uart_remove_one_port(&wqtuart_driver, &priv->port);
+	unregister_netdev(priv->ndev);
+
+	flush_workqueue(priv->wq);
+	destroy_workqueue(priv->wq);
+
+	wqtstatus_clrbit(priv, PCINET_NET_REGISTERS_VALID);
+	wqt_free_netregs(priv);
+
+	dma_async_client_unregister(&priv->client);
+
+	iounmap(priv->immr);
+
+	free_netdev(ndev);
+
+	return 0;
+}
+
+static struct of_device_id wqt_match[] = {
+	{ .compatible = "fsl,mpc8349-mu", },
+	{},
+};
+
+static struct of_platform_driver wqt_of_driver = {
+	.owner		= THIS_MODULE,
+	.name		= driver_name,
+	.match_table	= wqt_match,
+	.probe		= wqt_probe,
+	.remove		= wqt_remove,
+};
+
+/*----------------------------------------------------------------------------*/
+/* DMA Client Infrastructure                                                  */
+/*----------------------------------------------------------------------------*/
+
+/*----------------------------------------------------------------------------*/
+/* Module Init / Exit                                                         */
+/*----------------------------------------------------------------------------*/
+
+static int __init wqt_init(void)
+{
+	int ret;
+
+	ret = uart_register_driver(&wqtuart_driver);
+
+	if (ret)
+		goto out_uart_register_driver;
+
+	ret = of_register_platform_driver(&wqt_of_driver);
+
+	if (ret)
+		goto out_of_register_platform_driver;
+
+	return 0;
+
+out_of_register_platform_driver:
+	uart_unregister_driver(&wqtuart_driver);
+out_uart_register_driver:
+	return ret;
+}
+
+static void __exit wqt_exit(void)
+{
+	of_unregister_platform_driver(&wqt_of_driver);
+	uart_unregister_driver(&wqtuart_driver);
+}
+
+MODULE_AUTHOR("Ira W. Snyder <iws@ovro.caltech.edu>");
+MODULE_DESCRIPTION("PCINet/PCISerial Driver for MPC8349EMDS");
+MODULE_LICENSE("GPL");
+
+module_init(wqt_init);
+module_exit(wqt_exit);
diff --git a/drivers/net/pcinet_host.c b/drivers/net/pcinet_host.c
new file mode 100644
index 0000000..95af93c
--- /dev/null
+++ b/drivers/net/pcinet_host.c
@@ -0,0 +1,1383 @@ 
+/*
+ * PCINet and PCISerial Driver for Freescale MPC8349EMDS (Host side)
+ *
+ * Copyright (c) 2008 Ira W. Snyder <iws@ovro.caltech.edu>
+ *
+ * Heavily inspired by the drivers/net/fs_enet driver
+ *
+ * This file is licensed under the terms of the GNU General Public License
+ * version 2. This program is licensed "as is" without any warranty of any
+ * kind, whether express or implied.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
+#include <linux/interrupt.h>
+#include <linux/irqreturn.h>
+#include <linux/pci.h>
+#include <linux/serial.h>
+#include <linux/serial_core.h>
+#include <linux/etherdevice.h>
+#include <linux/mutex.h>
+
+#include "pcinet.h"
+#include "pcinet_hw.h"
+
+static const char driver_name[] = "wqt";
+
+static void wqtuart_rx_char(struct uart_port *port, const char ch);
+static void wqtuart_stop_tx(struct uart_port *port);
+
+struct wqt_dev;
+typedef void (*wqt_irqhandler_t)(struct wqt_dev *);
+
+struct wqt_irqhandlers {
+	wqt_irqhandler_t net_start_req_handler;
+	wqt_irqhandler_t net_start_ack_handler;
+	wqt_irqhandler_t net_stop_req_handler;
+	wqt_irqhandler_t net_stop_ack_handler;
+	wqt_irqhandler_t net_rx_packet_handler;
+	wqt_irqhandler_t net_tx_complete_handler;
+	wqt_irqhandler_t uart_rx_ready_handler;
+	wqt_irqhandler_t uart_tx_empty_handler;
+};
+
+struct wqt_dev {
+	/*--------------------------------------------------------------------*/
+	/* PCI Infrastructure                                                 */
+	/*--------------------------------------------------------------------*/
+	struct pci_dev *pdev;
+	struct device *dev;
+	void __iomem *immr;
+
+	struct mutex irq_mutex;
+	int interrupt_count;
+
+	spinlock_t irq_lock;
+	struct wqt_irqhandlers handlers;
+
+	/*--------------------------------------------------------------------*/
+	/* UART Device Infrastructure                                         */
+	/*--------------------------------------------------------------------*/
+	struct uart_port port;
+	bool uart_rx_enabled;
+	bool uart_open;
+
+	struct workqueue_struct *wq;
+	struct work_struct uart_tx_work;
+	wait_queue_head_t uart_tx_wait; /* sleep for uart_tx_ready */
+	bool uart_tx_ready; /* transmitter state */
+
+	/*--------------------------------------------------------------------*/
+	/* Ethernet Device Infrastructure                                     */
+	/*--------------------------------------------------------------------*/
+	struct net_device *ndev;
+	void __iomem *netregs;
+
+	/* Outstanding SKB */
+	struct sk_buff *rx_skbs[PH_RING_SIZE];
+	struct sk_buff *tx_skbs[PH_RING_SIZE];
+	dma_addr_t rx_skb_addrs[PH_RING_SIZE];
+	dma_addr_t tx_skb_addrs[PH_RING_SIZE];
+
+	/* Circular Buffer Descriptor base */
+	cbd_t __iomem *rx_base;
+	cbd_t __iomem *tx_base;
+
+	/* Current SKB index */
+	cbd_t __iomem *cur_rx;
+	cbd_t __iomem *cur_tx;
+	cbd_t __iomem *dirty_tx;
+	int tx_free;
+
+	struct tasklet_struct tx_complete_tasklet;
+	spinlock_t net_lock;
+
+	struct mutex net_mutex;
+	int net_state;
+	struct work_struct net_start_work;
+	struct work_struct net_stop_work;
+	struct completion net_start_completion;
+	struct completion net_stop_completion;
+	struct napi_struct napi;
+};
+
+/*----------------------------------------------------------------------------*/
+/* Status Register Helper Operations                                          */
+/*----------------------------------------------------------------------------*/
+
+static DEFINE_SPINLOCK(status_lock);
+
+static void wqtstatus_setbit(struct wqt_dev *priv, u32 bit)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&status_lock, flags);
+	IMMR_W32(IMR1_OFFSET, IMMR_R32(IMR1_OFFSET) | bit);
+	spin_unlock_irqrestore(&status_lock, flags);
+}
+
+static void wqtstatus_clrbit(struct wqt_dev *priv, u32 bit)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&status_lock, flags);
+	IMMR_W32(IMR1_OFFSET, IMMR_R32(IMR1_OFFSET) & ~bit);
+	spin_unlock_irqrestore(&status_lock, flags);
+}
+
+static int wqtstatus_remote_testbit(struct wqt_dev *priv, u32 bit)
+{
+	return IMMR_R32(OMR1_OFFSET) & bit;
+}
+
+/*----------------------------------------------------------------------------*/
+/* Message Sending and Processing Operations                                  */
+/*----------------------------------------------------------------------------*/
+
+static irqreturn_t wqt_interrupt(int irq, void *dev_id)
+{
+	struct wqt_dev *priv = dev_id;
+	u32 omisr, odr;
+	unsigned long flags;
+
+	omisr = IMMR_R32(OMISR_OFFSET);
+	odr = IMMR_R32(ODR_OFFSET);
+
+	if (!(omisr & 0x8))
+		return IRQ_NONE;
+
+	/* Clear all of the interrupt sources, we'll handle them next */
+	IMMR_W32(ODR_OFFSET, odr);
+
+	/* Lock over all of the handlers, so they cannot get called when
+	 * the code doesn't expect them to be called */
+	spin_lock_irqsave(&priv->irq_lock, flags);
+
+	if (odr & UART_RX_READY_DBELL)
+		priv->handlers.uart_rx_ready_handler(priv);
+
+	if (odr & UART_TX_EMPTY_DBELL)
+		priv->handlers.uart_tx_empty_handler(priv);
+
+	if (odr & NET_RX_PACKET_DBELL)
+		priv->handlers.net_rx_packet_handler(priv);
+
+	if (odr & NET_TX_COMPLETE_DBELL)
+		priv->handlers.net_tx_complete_handler(priv);
+
+	if (odr & NET_START_REQ_DBELL)
+		priv->handlers.net_start_req_handler(priv);
+
+	if (odr & NET_START_ACK_DBELL)
+		priv->handlers.net_start_ack_handler(priv);
+
+	if (odr & NET_STOP_REQ_DBELL)
+		priv->handlers.net_stop_req_handler(priv);
+
+	if (odr & NET_STOP_ACK_DBELL)
+		priv->handlers.net_stop_ack_handler(priv);
+
+	spin_unlock_irqrestore(&priv->irq_lock, flags);
+
+	return IRQ_HANDLED;
+}
+
+/* Send a character through the mbox when it becomes available
+ * Blocking, must not be called with any spinlocks held */
+static int do_send_message(struct wqt_dev *priv, const char ch)
+{
+	struct uart_port *port = &priv->port;
+	bool tmp;
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->irq_lock, flags);
+	while (priv->uart_tx_ready != true) {
+		spin_unlock_irqrestore(&priv->irq_lock, flags);
+		wait_event_timeout(priv->uart_tx_wait, priv->uart_tx_ready, HZ);
+
+		spin_lock_irqsave(&port->lock, flags);
+		tmp = priv->uart_open;
+		spin_unlock_irqrestore(&port->lock, flags);
+
+		if (!tmp)
+			return -EIO;
+
+		spin_lock_irqsave(&priv->irq_lock, flags);
+	}
+
+	/* Now the transmitter is free, send the message */
+	IMMR_W32(IMR0_OFFSET, ch);
+	IMMR_W32(IDR_OFFSET, UART_RX_READY_DBELL);
+
+	/* Mark the transmitter busy */
+	priv->uart_tx_ready = false;
+	spin_unlock_irqrestore(&priv->irq_lock, flags);
+	return 0;
+}
+
+/* Grab a character out of the uart tx buffer and send it */
+static void uart_tx_work_fn(struct work_struct *work)
+{
+	struct wqt_dev *priv = container_of(work, struct wqt_dev, uart_tx_work);
+	struct uart_port *port = &priv->port;
+	struct circ_buf *xmit = &port->info->xmit;
+	char ch;
+	unsigned long flags;
+
+	spin_lock_irqsave(&port->lock, flags);
+	while (true) {
+
+		/* Check for XON/XOFF (high priority) */
+		if (port->x_char) {
+			ch = port->x_char;
+			port->x_char = 0;
+			spin_unlock_irqrestore(&port->lock, flags);
+
+			if (do_send_message(priv, ch))
+				return;
+
+			spin_lock_irqsave(&port->lock, flags);
+			continue;
+		}
+
+		/* If we're out of chars or the port is stopped, we're done */
+		if (uart_circ_empty(xmit) || uart_tx_stopped(port)) {
+			wqtuart_stop_tx(port);
+			break;
+		}
+
+		/* Grab the next char out of the buffer and send it */
+		ch = xmit->buf[xmit->tail];
+		xmit->tail = (xmit->tail + 1) & (UART_XMIT_SIZE - 1);
+		spin_unlock_irqrestore(&port->lock, flags);
+
+		if (do_send_message(priv, ch))
+			return;
+
+		spin_lock_irqsave(&port->lock, flags);
+	}
+
+	if (uart_circ_chars_pending(xmit) < WAKEUP_CHARS)
+		uart_write_wakeup(port);
+
+	if (uart_circ_empty(xmit))
+		wqtuart_stop_tx(port);
+
+	spin_unlock_irqrestore(&port->lock, flags);
+}
+
+/*----------------------------------------------------------------------------*/
+/* Interrupt Handlers                                                         */
+/*----------------------------------------------------------------------------*/
+
+/* NOTE: All handlers are called with priv->irq_lock held */
+
+static void empty_handler(struct wqt_dev *priv)
+{
+	/* Intentionally left empty */
+}
+
+static void net_start_req_handler(struct wqt_dev *priv)
+{
+	schedule_work(&priv->net_start_work);
+}
+
+static void net_start_ack_handler(struct wqt_dev *priv)
+{
+	complete(&priv->net_start_completion);
+}
+
+static void net_stop_req_handler(struct wqt_dev *priv)
+{
+	schedule_work(&priv->net_stop_work);
+}
+
+static void net_stop_ack_handler(struct wqt_dev *priv)
+{
+	complete(&priv->net_stop_completion);
+}
+
+static void net_tx_complete_handler(struct wqt_dev *priv)
+{
+	tasklet_schedule(&priv->tx_complete_tasklet);
+}
+
+static void net_rx_packet_handler(struct wqt_dev *priv)
+{
+	wqtstatus_setbit(priv, PCINET_NET_RXINT_OFF);
+	netif_rx_schedule(priv->ndev, &priv->napi);
+}
+
+static void uart_rx_ready_handler(struct wqt_dev *priv)
+{
+	wqtuart_rx_char(&priv->port, IMMR_R32(OMR0_OFFSET) & 0xff);
+	IMMR_W32(IDR_OFFSET, UART_TX_EMPTY_DBELL);
+}
+
+static void uart_tx_empty_handler(struct wqt_dev *priv)
+{
+	priv->uart_tx_ready = true;
+	wake_up(&priv->uart_tx_wait);
+}
+
+/*----------------------------------------------------------------------------*/
+/* Interrupt Request / Free Helpers                                           */
+/*----------------------------------------------------------------------------*/
+
+static void do_enable_net_startstop_handlers(struct wqt_dev *priv)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->irq_lock, flags);
+	priv->handlers.net_start_req_handler = net_start_req_handler;
+	priv->handlers.net_start_ack_handler = net_start_ack_handler;
+	priv->handlers.net_stop_req_handler = net_stop_req_handler;
+	priv->handlers.net_stop_ack_handler = net_stop_ack_handler;
+	spin_unlock_irqrestore(&priv->irq_lock, flags);
+
+	wqtstatus_setbit(priv, PCINET_NET_STATUS_RUNNING);
+}
+
+static void do_disable_net_startstop_handlers(struct wqt_dev *priv)
+{
+	unsigned long flags;
+
+	wqtstatus_clrbit(priv, PCINET_NET_STATUS_RUNNING);
+
+	spin_lock_irqsave(&priv->irq_lock, flags);
+	priv->handlers.net_start_req_handler = empty_handler;
+	priv->handlers.net_start_ack_handler = empty_handler;
+	priv->handlers.net_stop_req_handler = empty_handler;
+	priv->handlers.net_stop_ack_handler = empty_handler;
+	spin_unlock_irqrestore(&priv->irq_lock, flags);
+}
+
+static void do_enable_net_rxtx_handlers(struct wqt_dev *priv)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->irq_lock, flags);
+	priv->handlers.net_rx_packet_handler = net_rx_packet_handler;
+	priv->handlers.net_tx_complete_handler = net_tx_complete_handler;
+	spin_unlock_irqrestore(&priv->irq_lock, flags);
+}
+
+static void do_disable_net_rxtx_handlers(struct wqt_dev *priv)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->irq_lock, flags);
+	priv->handlers.net_rx_packet_handler = empty_handler;
+	priv->handlers.net_tx_complete_handler = empty_handler;
+	spin_unlock_irqrestore(&priv->irq_lock, flags);
+}
+
+static void do_enable_uart_handlers(struct wqt_dev *priv)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->irq_lock, flags);
+	priv->handlers.uart_rx_ready_handler = uart_rx_ready_handler;
+	priv->handlers.uart_tx_empty_handler = uart_tx_empty_handler;
+	spin_unlock_irqrestore(&priv->irq_lock, flags);
+}
+
+static void do_disable_uart_handlers(struct wqt_dev *priv)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->irq_lock, flags);
+	priv->handlers.uart_rx_ready_handler = empty_handler;
+	priv->handlers.uart_tx_empty_handler = empty_handler;
+	spin_unlock_irqrestore(&priv->irq_lock, flags);
+}
+
+static int wqt_request_irq(struct wqt_dev *priv)
+{
+	int ret = 0;
+
+	mutex_lock(&priv->irq_mutex);
+
+	if (priv->interrupt_count > 0)
+		goto out_unlock;
+
+	/* Force all handlers to be disabled before attaching the handler */
+	do_disable_net_startstop_handlers(priv);
+	do_disable_net_rxtx_handlers(priv);
+	do_disable_uart_handlers(priv);
+
+	ret = request_irq(priv->pdev->irq,
+			  wqt_interrupt,
+			  IRQF_SHARED,
+			  priv->ndev->name,
+			  priv);
+
+out_unlock:
+	priv->interrupt_count++;
+	mutex_unlock(&priv->irq_mutex);
+
+	return ret;
+}
+
+static void wqt_free_irq(struct wqt_dev *priv)
+{
+	mutex_lock(&priv->irq_mutex);
+	priv->interrupt_count--;
+
+	if (priv->interrupt_count > 0)
+		goto out_unlock;
+
+	free_irq(priv->pdev->irq, priv);
+
+out_unlock:
+	mutex_unlock(&priv->irq_mutex);
+}
+
+/*----------------------------------------------------------------------------*/
+/* Network Startup and Shutdown Helpers                                       */
+/*----------------------------------------------------------------------------*/
+
+/* NOTE: All helper functions prefixed with "do" must be called only from
+ * process context, with priv->net_mutex held. They are expected to sleep */
+
+/* NOTE: queues must be stopped before initializing and uninitializing */
+
+static void do_net_initialize_board(struct wqt_dev *priv)
+{
+	int i;
+	cbd_t __iomem *bdp;
+
+	BUG_ON(!wqtstatus_remote_testbit(priv, PCINET_NET_REGISTERS_VALID));
+
+	/* Fill in RX ring */
+	for (i = 0, bdp = priv->rx_base; i < PH_RING_SIZE; bdp++, i++) {
+		CBDW_SC(bdp, BD_MEM_READY);
+		CBDW_LEN(bdp, PH_MAX_FRSIZE);
+		CBDW_ADDR(bdp, priv->rx_skb_addrs[i]);
+	}
+
+	/* Fill in TX ring */
+	for (i = 0, bdp = priv->tx_base; i < PH_RING_SIZE; bdp++, i++) {
+		CBDW_SC(bdp, BD_MEM_READY);
+		CBDW_LEN(bdp, 0);
+		CBDW_ADDR(bdp, 0x0);
+	}
+}
+
+static void do_net_uninitialize_board(struct wqt_dev *priv)
+{
+	struct sk_buff *skb;
+	dma_addr_t skb_addr;
+	cbd_t __iomem *bdp;
+	int i;
+
+	/* Reset TX ring */
+	for (i = 0, bdp = priv->tx_base; i < PH_RING_SIZE; bdp++, i++) {
+		if (priv->tx_skbs[i]) {
+			skb = priv->tx_skbs[i];
+			skb_addr = priv->tx_skb_addrs[i];
+
+			dma_unmap_single(priv->dev,
+					 skb_addr,
+					 skb->len,
+					 DMA_TO_DEVICE);
+
+			dev_kfree_skb(skb);
+
+			priv->tx_skbs[i] = NULL;
+			priv->tx_skb_addrs[i] = 0x0;
+		}
+
+		CBDW_SC(bdp, BD_MEM_READY);
+		CBDW_LEN(bdp, 0);
+		CBDW_ADDR(bdp, 0x0);
+	}
+}
+
+static void do_net_start_queues(struct wqt_dev *priv)
+{
+	if (priv->net_state == NET_STATE_RUNNING)
+		return;
+
+	dev_dbg(priv->dev, "resetting buffer positions\n");
+	priv->cur_rx = priv->rx_base;
+	priv->cur_tx = priv->tx_base;
+	priv->dirty_tx = priv->tx_base;
+	priv->tx_free = PH_RING_SIZE;
+
+	dev_dbg(priv->dev, "enabling NAPI queue\n");
+	napi_enable(&priv->napi);
+
+	dev_dbg(priv->dev, "enabling tx_complete() tasklet\n");
+	tasklet_enable(&priv->tx_complete_tasklet);
+
+	dev_dbg(priv->dev, "enabling TX queue\n");
+	netif_start_queue(priv->ndev);
+
+	dev_dbg(priv->dev, "carrier on!\n");
+	netif_carrier_on(priv->ndev);
+
+	/* Enable the RX_PACKET and TX_COMPLETE interrupt handlers */
+	do_enable_net_rxtx_handlers(priv);
+
+	priv->net_state = NET_STATE_RUNNING;
+}
+
+static void do_net_stop_queues(struct wqt_dev *priv)
+{
+	if (priv->net_state == NET_STATE_STOPPED)
+		return;
+
+	/* Disable the RX_PACKET and TX_COMPLETE interrupt handlers */
+	do_disable_net_rxtx_handlers(priv);
+
+	dev_dbg(priv->dev, "disabling NAPI queue\n");
+	napi_disable(&priv->napi);
+
+	dev_dbg(priv->dev, "disabling tx_complete() tasklet\n");
+	tasklet_disable(&priv->tx_complete_tasklet);
+
+	dev_dbg(priv->dev, "disabling TX queue\n");
+	netif_tx_disable(priv->ndev);
+
+	dev_dbg(priv->dev, "carrier off!\n");
+	netif_carrier_off(priv->ndev);
+
+	priv->net_state = NET_STATE_STOPPED;
+}
+
+/* Called when we get a request to start our queues and acknowledge */
+static void wqtnet_start_work_fn(struct work_struct *work)
+{
+	struct wqt_dev *priv = container_of(work, struct wqt_dev,
+					    net_start_work);
+
+	mutex_lock(&priv->net_mutex);
+
+	do_net_initialize_board(priv);
+	do_net_start_queues(priv);
+	IMMR_W32(IDR_OFFSET, NET_START_ACK_DBELL);
+
+	mutex_unlock(&priv->net_mutex);
+}
+
+/* Called when we get a request to stop our queues and acknowledge */
+static void wqtnet_stop_work_fn(struct work_struct *work)
+{
+	struct wqt_dev *priv = container_of(work, struct wqt_dev,
+					    net_stop_work);
+
+	mutex_lock(&priv->net_mutex);
+
+	do_net_stop_queues(priv);
+	do_net_uninitialize_board(priv);
+	IMMR_W32(IDR_OFFSET, NET_STOP_ACK_DBELL);
+
+	mutex_unlock(&priv->net_mutex);
+}
+
+/*----------------------------------------------------------------------------*/
+/* SKB Allocation Helpers                                                     */
+/*----------------------------------------------------------------------------*/
+
+static void wqt_cleanup_skbs(struct wqt_dev *priv)
+{
+	struct sk_buff *skb;
+	dma_addr_t skb_addr;
+	int i;
+
+	/* TX ring */
+	for (i = 0; i < PH_RING_SIZE; ++i) {
+		if (priv->tx_skbs[i]) {
+			skb = priv->tx_skbs[i];
+			skb_addr = priv->tx_skb_addrs[i];
+
+			dma_unmap_single(priv->dev,
+					 skb_addr,
+					 skb->len,
+					 DMA_TO_DEVICE);
+
+			dev_kfree_skb(skb);
+
+			priv->tx_skbs[i] = NULL;
+			priv->tx_skb_addrs[i] = 0x0;
+		}
+	}
+
+	/* RX ring */
+	for (i = 0; i < PH_RING_SIZE; ++i) {
+		if (priv->rx_skbs[i]) {
+			skb = priv->rx_skbs[i];
+			skb_addr = priv->rx_skb_addrs[i];
+
+			dma_unmap_single(priv->dev,
+					 skb_addr,
+					 PH_MAX_FRSIZE,
+					 DMA_FROM_DEVICE);
+
+			dev_kfree_skb(skb);
+
+			priv->rx_skbs[i] = NULL;
+			priv->rx_skb_addrs[i] = 0x0;
+		}
+	}
+}
+
+static int wqt_alloc_skbs(struct wqt_dev *priv)
+{
+	struct sk_buff *skb;
+	dma_addr_t skb_addr;
+	int i;
+
+	/* RX ring */
+	for (i = 0; i < PH_RING_SIZE; ++i) {
+		/* Paranoia check */
+		BUG_ON(priv->rx_skbs[i] != NULL);
+		BUG_ON(priv->rx_skb_addrs[i] != 0x0);
+
+		/* Allocate the skb */
+		skb = dev_alloc_skb(PH_MAX_FRSIZE + NET_IP_ALIGN);
+
+		if (skb == NULL)
+			goto out_err;
+
+		skb_reserve(skb, NET_IP_ALIGN);
+
+		/* DMA map the skb */
+		skb_addr = dma_map_single(priv->dev,
+					  skb->data,
+					  PH_MAX_FRSIZE,
+					  DMA_FROM_DEVICE);
+
+		if (dma_mapping_error(skb_addr)) {
+			dev_kfree_skb(skb);
+			goto out_err;
+		}
+
+		priv->rx_skbs[i] = skb;
+		priv->rx_skb_addrs[i] = skb_addr;
+	}
+
+	/* TX ring */
+	for (i = 0; i < PH_RING_SIZE; ++i) {
+		/* Paranoia check */
+		BUG_ON(priv->tx_skbs[i] != NULL);
+		BUG_ON(priv->tx_skb_addrs[i] != 0x0);
+	}
+
+	/* NOTE: the actual initialization of the board happens
+	 * NOTE: in ph_initialize_board(), once the board has
+	 * NOTE: requested to be initialized */
+
+	return 0;
+
+out_err:
+	wqt_cleanup_skbs(priv);
+	return -ENOMEM;
+}
+
+/*----------------------------------------------------------------------------*/
+/* Network Device Operations                                                  */
+/*----------------------------------------------------------------------------*/
+
+static int wqt_open(struct net_device *dev)
+{
+	struct wqt_dev *priv = netdev_priv(dev);
+	int ret;
+
+	/* Check that the other side has registers */
+	if (!wqtstatus_remote_testbit(priv, PCINET_NET_REGISTERS_VALID)) {
+		dev_err(priv->dev, "no driver installed at the other end\n");
+		dev_err(priv->dev, "cowardly refusing to open\n");
+		return -ENOTCONN; /* Transport endpoint is not connected */
+	}
+
+	/* Pretend the cable is unplugged until we are up and running */
+	netif_carrier_off(dev);
+
+	mutex_lock(&priv->net_mutex);
+
+	ret = wqt_alloc_skbs(priv);
+
+	if (ret)
+		goto out_err;
+
+	do_net_initialize_board(priv);
+
+	ret = wqt_request_irq(priv);
+
+	if (ret)
+		goto out_err;
+
+	/* Enable only the network start/stop interrupts */
+	do_enable_net_startstop_handlers(priv);
+
+	/* Check if the other side is running, if not, it will start us.
+	 * Without the interrupt handler installed, there's no way it can
+	 * respond to us anyway */
+	if (!wqtstatus_remote_testbit(priv, PCINET_NET_STATUS_RUNNING))
+		goto out_unlock;
+
+	do_net_initialize_board(priv);
+
+	IMMR_W32(IDR_OFFSET, NET_START_REQ_DBELL);
+	ret = wait_for_completion_timeout(&priv->net_start_completion, 5*HZ);
+
+	if (!ret) {
+		/* Our start request timed out, therefore, the other
+		 * side will start us when it comes back up */
+		dev_dbg(priv->dev, "start timed out\n");
+	} else {
+		do_net_start_queues(priv);
+		ret = 0;
+	}
+
+out_unlock:
+	mutex_unlock(&priv->net_mutex);
+	return 0;
+
+out_err:
+	wqt_cleanup_skbs(priv);
+	mutex_unlock(&priv->net_mutex);
+	return ret;
+}
+
+static int wqt_stop(struct net_device *dev)
+{
+	struct wqt_dev *priv = netdev_priv(dev);
+	int ret;
+
+	mutex_lock(&priv->net_mutex);
+
+	do_net_stop_queues(priv);
+
+	IMMR_W32(IDR_OFFSET, NET_STOP_REQ_DBELL);
+	ret = wait_for_completion_timeout(&priv->net_stop_completion, 5*HZ);
+
+	if (!ret)
+		dev_warn(priv->dev, "other side did not stop on time!\n");
+	else
+		ret = 0;
+
+	do_disable_net_startstop_handlers(priv);
+	wqt_free_irq(priv);
+	do_net_uninitialize_board(priv);
+	wqt_cleanup_skbs(priv);
+
+	mutex_unlock(&priv->net_mutex);
+	return 0;
+}
+
+static int wqt_change_mtu(struct net_device *dev, int new_mtu)
+{
+	if ((new_mtu < 68) || (new_mtu > PH_MAX_MTU))
+		return -EINVAL;
+
+	dev->mtu = new_mtu;
+	return 0;
+}
+
+static int wqt_hard_start_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct wqt_dev *priv = netdev_priv(dev);
+	dma_addr_t skb_addr;
+	cbd_t __iomem *bdp;
+	int dirty_idx;
+
+	spin_lock_bh(&priv->net_lock);
+
+	bdp = priv->cur_tx;
+	dirty_idx = bdp - priv->tx_base;
+
+	/* This should not happen, the queue should be stopped */
+	if (priv->tx_free == 0 || CBDR_SC(bdp) != BD_MEM_READY) {
+		netif_stop_queue(dev);
+		spin_unlock_bh(&priv->net_lock);
+		return NETDEV_TX_BUSY;
+	}
+
+	skb_addr = dma_map_single(priv->dev,
+				  skb->data,
+				  skb->len,
+				  DMA_TO_DEVICE);
+
+	if (dma_mapping_error(skb_addr)) {
+		dev_warn(priv->dev, "DMA mapping error\n");
+		spin_unlock_bh(&priv->net_lock);
+		return -ENOMEM;
+	}
+
+	BUG_ON(priv->tx_skbs[dirty_idx] != NULL);
+	BUG_ON(priv->tx_skb_addrs[dirty_idx] != 0x0);
+
+	priv->tx_skbs[dirty_idx] = skb;
+	priv->tx_skb_addrs[dirty_idx] = skb_addr;
+
+	CBDW_LEN(bdp, skb->len);
+	CBDW_ADDR(bdp, skb_addr);
+	CBDW_SC(bdp, BD_MEM_DIRTY);
+
+	if (dirty_idx == PH_RING_SIZE - 1)
+		bdp = priv->tx_base;
+	else
+		bdp++;
+
+	priv->cur_tx = bdp;
+	priv->tx_free--;
+	dev->trans_start = jiffies;
+
+	if (priv->tx_free == 0)
+		netif_stop_queue(dev);
+
+	if (!wqtstatus_remote_testbit(priv, PCINET_NET_RXINT_OFF))
+		IMMR_W32(IDR_OFFSET, NET_RX_PACKET_DBELL);
+
+	spin_unlock_bh(&priv->net_lock);
+	return NETDEV_TX_OK;
+}
+
+static void wqt_tx_timeout(struct net_device *dev)
+{
+	struct wqt_dev *priv = netdev_priv(dev);
+
+	dev->stats.tx_errors++;
+	IMMR_W32(IDR_OFFSET, NET_RX_PACKET_DBELL);
+}
+
+static void wqt_tx_complete(unsigned long data)
+{
+	struct net_device *dev = (struct net_device *)data;
+	struct wqt_dev *priv = netdev_priv(dev);
+	struct sk_buff *skb;
+	dma_addr_t skb_addr;
+	cbd_t __iomem *bdp;
+	int do_wake, dirty_idx;
+
+	spin_lock_bh(&priv->net_lock);
+
+	bdp = priv->dirty_tx;
+	do_wake = 0;
+
+	while (CBDR_SC(bdp) == BD_MEM_FREE) {
+		dirty_idx = bdp - priv->tx_base;
+
+		skb = priv->tx_skbs[dirty_idx];
+		skb_addr = priv->tx_skb_addrs[dirty_idx];
+
+		BUG_ON(skb == NULL);
+		BUG_ON(skb_addr == 0x0);
+
+		dev->stats.tx_bytes += skb->len;
+		dev->stats.tx_packets++;
+
+		/* Unmap and free the transmitted skb */
+		dma_unmap_single(priv->dev,
+				 skb_addr,
+				 skb->len,
+				 DMA_TO_DEVICE);
+		dev_kfree_skb_irq(skb);
+
+		priv->tx_skbs[dirty_idx] = NULL;
+		priv->tx_skb_addrs[dirty_idx] = 0x0;
+
+		/* Invalidate the buffer descriptor */
+		CBDW_LEN(bdp, 0);
+		CBDW_ADDR(bdp, 0x0);
+		CBDW_SC(bdp, BD_MEM_READY);
+
+		/* Update the bdp */
+		if (dirty_idx == PH_RING_SIZE - 1)
+			bdp = priv->tx_base;
+		else
+			bdp++;
+
+		if (!priv->tx_free++)
+			do_wake = 1;
+	}
+
+	priv->dirty_tx = bdp;
+
+	spin_unlock_bh(&priv->net_lock);
+
+	if (do_wake)
+		netif_wake_queue(dev);
+}
+
+static int wqt_rx_napi(struct napi_struct *napi, int budget)
+{
+	struct wqt_dev *priv = container_of(napi, struct wqt_dev, napi);
+	struct net_device *dev = priv->ndev;
+	int received = 0;
+	struct sk_buff *skb, *skbn;
+	dma_addr_t skb_addr, skbn_addr;
+	int pkt_len, dirty_idx;
+	cbd_t __iomem *bdp;
+
+	bdp = priv->cur_rx;
+
+	while (CBDR_SC(bdp) == BD_MEM_DIRTY) {
+		dirty_idx = bdp - priv->rx_base;
+
+		skb = priv->rx_skbs[dirty_idx];
+		skb_addr = priv->rx_skb_addrs[dirty_idx];
+
+		BUG_ON(skb == NULL);
+		BUG_ON(skb_addr == 0x0);
+
+		/* Allocate the next rx skb and DMA map it */
+		skbn = dev_alloc_skb(PH_MAX_FRSIZE + NET_IP_ALIGN);
+
+		if (skbn == NULL) {
+			skbn = skb;
+			skbn_addr = skb_addr;
+			dev->stats.rx_dropped++;
+			goto out_noalloc;
+		}
+
+		skb_reserve(skbn, NET_IP_ALIGN);
+
+		skbn_addr = dma_map_single(priv->dev,
+					   skbn->data,
+					   PH_MAX_FRSIZE,
+					   DMA_FROM_DEVICE);
+
+		if (dma_mapping_error(skbn_addr)) {
+			dev_kfree_skb_irq(skbn);
+			skbn = skb;
+			skbn_addr = skb_addr;
+			dev->stats.rx_dropped++;
+			goto out_noalloc;
+		}
+
+		/* DMA unmap the old skb and pass it up */
+		dma_unmap_single(priv->dev,
+				 skb_addr,
+				 PH_MAX_FRSIZE,
+				 DMA_FROM_DEVICE);
+
+		pkt_len = CBDR_LEN(bdp);
+		skb_put(skb, pkt_len);
+		skb->protocol = eth_type_trans(skb, dev);
+#ifdef CONFIG_PCINET_DISABLE_CHECKSUM
+		skb->ip_summed = CHECKSUM_UNNECESSARY;
+#else
+		skb->ip_summed = CHECKSUM_NONE;
+#endif
+		netif_receive_skb(skb);
+		received++;
+		dev->stats.rx_bytes += pkt_len;
+		dev->stats.rx_packets++;
+
+out_noalloc:
+		/* Write the new skb into the buffer descriptor */
+		CBDW_LEN(bdp, PH_MAX_FRSIZE);
+		CBDW_ADDR(bdp, skbn_addr);
+		CBDW_SC(bdp, BD_MEM_FREE);
+
+		priv->rx_skbs[dirty_idx] = skbn;
+		priv->rx_skb_addrs[dirty_idx] = skbn_addr;
+
+		/* Update the bdp */
+		if (dirty_idx == PH_RING_SIZE - 1)
+			bdp = priv->rx_base;
+		else
+			bdp++;
+
+		if (received >= budget)
+			break;
+	}
+
+	priv->cur_rx = bdp;
+
+	/* We have processed all packets that the adapter had, but it
+	 * was less than our budget, stop polling */
+	if (received < budget) {
+		netif_rx_complete(dev, napi);
+		wqtstatus_clrbit(priv, PCINET_NET_RXINT_OFF);
+	}
+
+	IMMR_W32(IDR_OFFSET, NET_TX_COMPLETE_DBELL);
+
+	return received;
+}
+
+/*----------------------------------------------------------------------------*/
+/* UART Device Operations                                                     */
+/*----------------------------------------------------------------------------*/
+
+static unsigned int wqtuart_tx_empty(struct uart_port *port)
+{
+	return TIOCSER_TEMT;
+}
+
+static void wqtuart_set_mctrl(struct uart_port *port, unsigned int mctrl)
+{
+}
+
+static unsigned int wqtuart_get_mctrl(struct uart_port *port)
+{
+	return TIOCM_CAR | TIOCM_DSR | TIOCM_CTS;
+}
+
+static void wqtuart_stop_tx(struct uart_port *port)
+{
+}
+
+static void wqtuart_start_tx(struct uart_port *port)
+{
+	struct wqt_dev *priv = container_of(port, struct wqt_dev, port);
+
+	queue_work(priv->wq, &priv->uart_tx_work);
+}
+
+static void wqtuart_stop_rx(struct uart_port *port)
+{
+	struct wqt_dev *priv = container_of(port, struct wqt_dev, port);
+
+	do_disable_uart_handlers(priv);
+	priv->uart_rx_enabled = false;
+	wqtstatus_clrbit(priv, PCINET_UART_RX_ENABLED);
+}
+
+static void wqtuart_enable_ms(struct uart_port *port)
+{
+}
+
+static void wqtuart_break_ctl(struct uart_port *port, int break_state)
+{
+}
+
+static int wqtuart_startup(struct uart_port *port)
+{
+	struct wqt_dev *priv = container_of(port, struct wqt_dev, port);
+	int ret;
+
+	ret = wqt_request_irq(priv);
+
+	if (ret)
+		return ret;
+
+	do_enable_uart_handlers(priv);
+
+	/* Mark the transmitter and receiver ready */
+	priv->uart_tx_ready = true;
+	priv->uart_rx_enabled = true;
+	wqtstatus_setbit(priv, PCINET_UART_RX_ENABLED);
+
+	/* Let the other side know that we are ready to receive chars now */
+	IMMR_W32(IDR_OFFSET, UART_TX_EMPTY_DBELL);
+	priv->uart_open = true;
+	return 0;
+}
+
+static void wqtuart_shutdown(struct uart_port *port)
+{
+	struct wqt_dev *priv = container_of(port, struct wqt_dev, port);
+
+	wqt_free_irq(priv);
+
+	/* Make sure the uart_tx_work_fn() exits cleanly */
+	priv->uart_open = false;
+	wake_up(&priv->uart_tx_wait);
+}
+
+static void wqtuart_set_termios(struct uart_port *port,
+			       struct ktermios *termios,
+			       struct ktermios *old)
+{
+}
+
+static const char *wqtuart_type(struct uart_port *port)
+{
+	return "WQTUART";
+}
+
+static int wqtuart_request_port(struct uart_port *port)
+{
+	return 0;
+}
+
+static void wqtuart_config_port(struct uart_port *port, int flags)
+{
+}
+
+static void wqtuart_release_port(struct uart_port *port)
+{
+}
+
+static int wqtuart_verify_port(struct uart_port *port,
+			      struct serial_struct *ser)
+{
+	return 0;
+}
+
+static void wqtuart_rx_char(struct uart_port *port, const char ch)
+{
+	struct wqt_dev *priv = container_of(port, struct wqt_dev, port);
+	struct tty_struct *tty;
+	unsigned long flags;
+
+	spin_lock_irqsave(&port->lock, flags);
+
+	if (priv->uart_rx_enabled) {
+		tty = port->info->port.tty;
+		tty_insert_flip_char(tty, ch, TTY_NORMAL);
+		tty_flip_buffer_push(tty);
+	}
+
+	spin_unlock_irqrestore(&port->lock, flags);
+}
+
+static struct uart_ops wqtuart_ops = {
+	.tx_empty	= wqtuart_tx_empty,
+	.set_mctrl	= wqtuart_set_mctrl,
+	.get_mctrl	= wqtuart_get_mctrl,
+	.stop_tx	= wqtuart_stop_tx,
+	.start_tx	= wqtuart_start_tx,
+	.stop_rx	= wqtuart_stop_rx,
+	.enable_ms	= wqtuart_enable_ms,
+	.break_ctl	= wqtuart_break_ctl,
+	.startup	= wqtuart_startup,
+	.shutdown	= wqtuart_shutdown,
+	.set_termios	= wqtuart_set_termios,
+	.type		= wqtuart_type,
+	.release_port	= wqtuart_release_port,
+	.request_port	= wqtuart_request_port,
+	.config_port	= wqtuart_config_port,
+	.verify_port	= wqtuart_verify_port,
+};
+
+static struct uart_driver wqtuart_driver = {
+	.owner		= THIS_MODULE,
+	.driver_name	= driver_name,
+	.dev_name	= "ttyPCI",
+	.major		= 240,
+	.minor		= 0,
+	.nr		= 1,
+};
+
+/*----------------------------------------------------------------------------*/
+/* PCI Subsystem                                                              */
+/*----------------------------------------------------------------------------*/
+
+static int wqt_probe(struct pci_dev *dev, const struct pci_device_id *id)
+{
+	struct net_device *ndev;
+	struct wqt_dev *priv;
+	int ret;
+
+	ndev = alloc_etherdev(sizeof(*priv));
+
+	if (!ndev) {
+		ret = -ENOMEM;
+		goto out_alloc_ndev;
+	}
+
+	pci_set_drvdata(dev, ndev);
+	priv = netdev_priv(ndev);
+	priv->pdev = dev;
+	priv->dev = &dev->dev;
+	priv->ndev = ndev;
+
+	mutex_init(&priv->irq_mutex);
+	spin_lock_init(&priv->irq_lock);
+
+	/* Hardware Initialization */
+	ret = pci_enable_device(dev);
+
+	if (ret)
+		goto out_pci_enable_dev;
+
+	pci_set_master(dev);
+
+	ret = pci_request_regions(dev, driver_name);
+
+	if (ret)
+		goto out_pci_request_regions;
+
+	priv->immr = pci_iomap(dev, 0, 0);
+
+	if (!priv->immr) {
+		ret = -ENOMEM;
+		goto out_iomap_immr;
+	}
+
+	priv->netregs = pci_iomap(dev, 1, 0);
+
+	if (!priv->netregs) {
+		ret = -ENOMEM;
+		goto out_iomap_netregs;
+	}
+
+	priv->rx_base = priv->netregs + PCINET_RXBD_BASE;
+	priv->tx_base = priv->netregs + PCINET_TXBD_BASE;
+
+	ret = dma_set_mask(&dev->dev, 0xcfffffff);
+
+	if (ret) {
+		dev_err(&dev->dev, "Unable to set DMA mask\n");
+		ret = -ENODEV;
+		goto out_set_dma_mask;
+	}
+
+	/* Initialize private data */
+	priv->wq = create_singlethread_workqueue(driver_name);
+
+	if (!priv->wq) {
+		ret = -ENOMEM;
+		goto out_create_workqueue;
+	}
+
+	INIT_WORK(&priv->uart_tx_work, uart_tx_work_fn);
+	init_waitqueue_head(&priv->uart_tx_wait);
+	priv->uart_tx_ready = true;
+
+	tasklet_init(&priv->tx_complete_tasklet, wqt_tx_complete,
+		     (unsigned long)ndev);
+	tasklet_disable(&priv->tx_complete_tasklet);
+	spin_lock_init(&priv->net_lock);
+
+	mutex_init(&priv->net_mutex);
+	priv->net_state = NET_STATE_STOPPED;
+	INIT_WORK(&priv->net_start_work, wqtnet_start_work_fn);
+	INIT_WORK(&priv->net_stop_work, wqtnet_stop_work_fn);
+	init_completion(&priv->net_start_completion);
+	init_completion(&priv->net_stop_completion);
+
+	/* Mask all of the MBOX interrupts */
+	IMMR_W32(OMIMR_OFFSET, 0x1 | 0x2);
+
+	/* Network Device */
+	random_ether_addr(ndev->dev_addr);
+
+	ndev->open              = wqt_open;
+	ndev->stop              = wqt_stop;
+	ndev->change_mtu        = wqt_change_mtu;
+	ndev->hard_start_xmit   = wqt_hard_start_xmit;
+	ndev->tx_timeout        = wqt_tx_timeout;
+	ndev->watchdog_timeo    = HZ/4;
+	ndev->flags            &= ~IFF_MULTICAST;  /* No multicast support */
+#ifdef CONFIG_PCINET_DISABLE_CHECKSUM
+	ndev->features         |= NETIF_F_NO_CSUM; /* No checksum needed */
+#endif
+	ndev->mtu               = PH_MAX_MTU;
+	netif_napi_add(ndev, &priv->napi, wqt_rx_napi, PH_RING_SIZE);
+
+	ret = register_netdev(ndev);
+
+	if (ret)
+		goto out_register_netdev;
+
+	/* UART Device */
+	priv->port.ops = &wqtuart_ops;
+	priv->port.type = PORT_16550A;
+	priv->port.dev = &dev->dev;
+	priv->port.line = 0;
+	spin_lock_init(&priv->port.lock);
+
+	ret = uart_add_one_port(&wqtuart_driver, &priv->port);
+
+	if (ret)
+		goto out_add_uart_port;
+
+	dev_info(priv->dev, "using ethernet device %s\n", ndev->name);
+	dev_info(priv->dev, "using serial device %s%d\n",
+			wqtuart_driver.dev_name, priv->port.line);
+	return 0;
+
+out_add_uart_port:
+	unregister_netdev(ndev);
+out_register_netdev:
+	destroy_workqueue(priv->wq);
+out_create_workqueue:
+out_set_dma_mask:
+	pci_iounmap(dev, priv->netregs);
+out_iomap_netregs:
+	pci_iounmap(dev, priv->immr);
+out_iomap_immr:
+	pci_release_regions(dev);
+out_pci_request_regions:
+	pci_disable_device(dev);
+out_pci_enable_dev:
+	free_netdev(ndev);
+out_alloc_ndev:
+	return ret;
+}
+
+static void wqt_remove(struct pci_dev *dev)
+{
+	struct net_device *ndev = pci_get_drvdata(dev);
+	struct wqt_dev *priv = netdev_priv(ndev);
+
+	uart_remove_one_port(&wqtuart_driver, &priv->port);
+	unregister_netdev(priv->ndev);
+
+	flush_workqueue(priv->wq);
+	destroy_workqueue(priv->wq);
+
+	pci_iounmap(dev, priv->netregs);
+	pci_iounmap(dev, priv->immr);
+	pci_release_regions(dev);
+	pci_disable_device(dev);
+
+	free_netdev(ndev);
+}
+
+#define PCI_DEVID_FSL_MPC8349EMDS 0x0080
+
+/* The list of devices that this module will support */
+static struct pci_device_id wqt_ids[] = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_FREESCALE, PCI_DEVID_FSL_MPC8349EMDS), },
+	{ 0, }
+};
+MODULE_DEVICE_TABLE(pci, wqt_ids);
+
+static struct pci_driver wqt_pci_driver = {
+	.name     = (char *)driver_name,
+	.id_table = wqt_ids,
+	.probe    = wqt_probe,
+	.remove   = wqt_remove,
+};
+
+/*----------------------------------------------------------------------------*/
+/* Module Init / Exit                                                         */
+/*----------------------------------------------------------------------------*/
+
+static int __init wqt_init(void)
+{
+	int ret;
+
+	ret = uart_register_driver(&wqtuart_driver);
+
+	if (ret)
+		goto out_uart_register_driver;
+
+	ret = pci_register_driver(&wqt_pci_driver);
+
+	if (ret)
+		goto out_pci_register_driver;
+
+	return 0;
+
+out_pci_register_driver:
+	uart_unregister_driver(&wqtuart_driver);
+out_uart_register_driver:
+	return ret;
+}
+
+static void __exit wqt_exit(void)
+{
+	pci_unregister_driver(&wqt_pci_driver);
+	uart_unregister_driver(&wqtuart_driver);
+}
+
+MODULE_AUTHOR("Ira W. Snyder <iws@ovro.caltech.edu>");
+MODULE_DESCRIPTION("PCINet/PCISerial Driver for MPC8349EMDS (Host side)");
+MODULE_LICENSE("GPL");
+
+module_init(wqt_init);
+module_exit(wqt_exit);
diff --git a/drivers/net/pcinet_hw.h b/drivers/net/pcinet_hw.h
new file mode 100644
index 0000000..499ba61
--- /dev/null
+++ b/drivers/net/pcinet_hw.h
@@ -0,0 +1,77 @@ 
+/*
+ * Register offsets for the MPC8349EMDS Message Unit from the IMMR base address
+ *
+ * Copyright (c) 2008 Ira W. Snyder <iws@ovro.caltech.edu>
+ *
+ * This file is licensed under the terms of the GNU General Public License
+ * version 2. This program is licensed "as is" without any warranty of any
+ * kind, whether express or implied.
+ */
+
+#ifndef PCINET_HW_H
+#define PCINET_HW_H
+
+/* mpc8349emds message unit register offsets */
+#define OMISR_OFFSET		0x8030
+#define OMIMR_OFFSET		0x8034
+#define IMR0_OFFSET		0x8050
+#define IMR1_OFFSET		0x8054
+#define OMR0_OFFSET		0x8058
+#define OMR1_OFFSET		0x805C
+#define ODR_OFFSET		0x8060
+#define IDR_OFFSET		0x8068
+#define IMISR_OFFSET		0x8080
+#define IMIMR_OFFSET		0x8084
+
+
+/* mpc8349emds pci and local access window register offsets */
+#define LAWAR0_OFFSET		0x0064
+#define LAWAR0_ENABLE		(1<<31)
+
+#define POCMR0_OFFSET		0x8410
+#define POCMR0_ENABLE		(1<<31)
+
+#define POTAR0_OFFSET		0x8400
+
+#define LAWAR1_OFFSET		0x006c
+#define LAWAR1_ENABLE		(1<<31)
+
+#define POCMR1_OFFSET		0x8428
+#define POCMR1_ENABLE		(1<<31)
+
+#define POTAR1_OFFSET		0x8418
+
+
+/* mpc8349emds dma controller register offsets */
+#define DMAMR0_OFFSET		0x8100
+#define DMASR0_OFFSET		0x8104
+#define DMASAR0_OFFSET		0x8110
+#define DMADAR0_OFFSET		0x8118
+#define DMABCR0_OFFSET		0x8120
+
+#define DMA_CHANNEL_BUSY	(1<<2)
+
+#define DMA_DIRECT_MODE_SNOOP	(1<<20)
+#define DMA_CHANNEL_MODE_DIRECT	(1<<2)
+#define DMA_CHANNEL_START	(1<<0)
+
+
+/* mpc8349emds pci and local access window register offsets */
+#define LAWAR0_OFFSET		0x0064
+#define LAWAR0_ENABLE		(1<<31)
+
+#define POCMR0_OFFSET		0x8410
+#define POCMR0_ENABLE		(1<<31)
+
+#define POTAR0_OFFSET		0x8400
+
+
+/* mpc8349emds pci and inbound window register offsets */
+#define PITAR0_OFFSET		0x8568
+#define PIWAR0_OFFSET		0x8578
+
+#define PIWAR0_ENABLED		(1<<31)
+#define PIWAR0_PREFETCH		(1<<29)
+#define PIWAR0_IWS_4K		0xb
+
+#endif /* PCINET_HW_H */